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Abstract 


The  goal  of  program  analysis  is  to  determine  antomatically  properties  of  the 
rnn-time  behavior  of  a  program.  Tools  of  software  development,  snch  as  compilers, 
program-verification  systems,  and  program-comprehension  systems,  are  in  large  part 
based  on  program  analyses.  Most  semantics-based  program  analyses  model  the  rnn- 
time  behavior  of  a  program  as  a  trace  of  execntion  states  and  compnte  a  property 
of  these  states.  Typically,  this  property  is  drawn  from  a  predetermined  langnage  of 
semantic  information,  snch  as  aliasing  descriptions  or  types  of  valnes.  The  standard 
methodology  of  program  analysis  is  to  constrnct  the  property  as  a  fixed  point,  a 
single  execntion  step  at  a  time.  We  explain  that  these  nbiqnitons  methodological 
choices — the  a  priori  choice  of  the  describable  program  properties  and  the  nse  of  a 
fixed-point  compntation — have  some  fnndamental  limitations  and  can  resnlt  in  poor 
precision. 

In  this  dissertation,  we  present  a  different  approach  to  semantics-based  program 
analysis.  Onr  methodology  is  based  on  transfer  relations  that  precisely  describe 
the  changes  between  the  state  of  memory  one  point  dnring  execntion  and  the  state 
of  memory  at  some  later  point  in  the  execntion.  We  isolate  a  langnage  TR  of 
concise  compnter-representable  presentations  of  transfer  relations.  We  also  give  an 
algorithm  ©  that,  given  two  transfer  relations  from  TR,  symbolically  constrncts 
a  third  transfer  relation  in  TR  that  is  semantically  eqnivalent  to  their  relational 
composition.  An  analysis  designer  begins  by  describing  the  operational  semantics 
of  a  sonrce  langnage  as  a  set  of  TR-terms  that  precisely  describe  the  atomic  steps 
of  execntion.  Then  an  analysis  algorithm  repeatedly  applies  ©  to  bnild  a  precise 
rnn-time  description  of  any  finite  control  path  of  interest. 

We  show  that  TR  is  expressive  enongh  to  describe  a  wide  variety  of  sonrce- 
langnage  featnres,  inclnding  heap-allocated  mntable  data  strnctnres,  arrays,  point¬ 
ers,  and  first-class  fnnctions.  We  then  explain  how  onr  analysis  methodology  over¬ 
comes  some  cnrrent  limitations  of  program  analysis.  The  transfer  relations  them¬ 
selves  are  nsefnl  program  properties  and  wonld  be  difficnlt  or  impossible  to  formnlate 
with  classical  approaches  to  program  analysis.  Bnt  we  also  describe  some  classes  of 
analysis  applications  that  are  based  on  transfer  relations.  For  instance,  we  explain 
that  the  classical  limitation  of  program  analysis  to  bnild  a  property  a  single  execn¬ 
tion  step  at  a  time  can  resnlt  in  dramatic  loss  of  precision,  bnt  may  be  overcome 
by  nsing  ©  to  compose  mnltiple  steps  before  applying  a  classical  analysis.  Further¬ 
more,  we  show  how  to  compute  precise  properties  of  loops  symbolically,  avoiding 
the  inevitable  imprecision  of  a  fixed-point  computation. 
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Introduction 


Chapter  1 


Some  Topics  in  Program  Analysis 


The  goal  of  program  analysis  is  to  determine  antomatically  at  compile  time  some  properties 
abont  the  rnn-time  behavior  of  a  program.  There  are  several  major  applications  of  program 
analysis. 

•  Compiler  support.  It  is  reasonably  straightforward  to  implement  a  correct  compila¬ 
tion  of  a  program  from  a  high-level  langnage  to  machine  code,  bnt  it  is  not  as  easy  to 
implement  a  high-qnality  compilation.  This  is  becanse  the  program  may  have  a  special¬ 
ized  rnn-time  behavior  that  the  compiler  conld  exploit,  bnt  this  rnn-time  behavior  may 
not  be  easy  to  detect  from  a  simple  examination  of  the  code.  Therefore,  the  compiler 
mnst  invoke  a  program  analysis  to  nncover  this  rnn-time  behavior.  For  instance,  most 
compilers  nse  data-flow  analysis  (e.g.,  [KU76],  [MJ81])  and  alias  analysis  (e.g.,  [CWZ90], 
[Lan91],  [Den94])  to  enable  classic  optimizations  snch  as  common-snbexpression  elimina¬ 
tion,  copy  propagation,  and  hoisting  of  loop-invariant  compntations  [ASU86].  Similarly, 
some  compilers  for  langnages  with  first-class  fnnctions  nse  a  control-flow  analysis  (e.g., 
[JM79],  [Shi91])  to  constrnct  a  conservative  control  graph.  Compiler  snpport  is  far  and 
away  the  most  common  application  of  program  analysis. 

•  Program  verification.  One  wonld  like  to  check  statically  that  a  program  will  behave 
properly  at  rnn  time.  For  instance,  an  analysis  might  verify  that  a  C  program  never 
attempts  to  dereference  a  dangling  pointer;  or  if  it  cannot  verify  a  property  that  strong, 
it  might  at  least  isolate  a  small  nnmber  of  potential  tronble  spots  in  the  code.  Also, 
strongly  typed  langnages  snch  as  Standard  ML  [MTH90]  verify  at  compile  time  that  a 
program  is  well-typed  and  thns  completely  eliminate  any  possibility  of  a  type  error  at  rnn 
time.  Furthermore,  static  type-checking  reveals  at  compile  time  a  remarkable  percentage 
of  programmer  errors. 

•  Program  comprehension.  A  subject  that  has  been  gaining  interest  in  recent  years 
is  the  use  of  program  analysis  to  aid  the  human  understanding  of  code.  For  instance, 
the  work  in  static  debugging  [Bou93a,  Bou93b]  allows  the  user  to  specify  various  kinds 
of  pre-  and  post-conditions  at  different  points  in  the  program,  and  then  calculates  the 
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corresponding  information  abont  the  ranges  of  nnmeric  variables.  Also,  program  slicing 
(e.g.,  [HRB90],  [FRT95])  isolates  the  parts  of  a  program  that  contribnte  to  or  depend  on 
a  particnlar  variable  in  the  program  chosen  by  the  nser. 


This  dissertation  presents  some  new  developments  in  the  theory  of  program  analysis.  By 
“theory  of  program  analysis”  we  mean  that  we  are  concerned  less  with  specific  analysis  problems 
or  specific  applications  of  program  analysis,  and  more  with  generic  semantic  tools  that  are 
powerfnl  and  yet  easy  to  apply  to  a  variety  of  real  programming  langnages  and  analysis  tasks. 


To  pnt  onr  goals  into  perspective,  we  compare  them  to  the  goals  of  abstract  interpretation. 
Abstract  interpretation  [CC77]  is  a  general  theory  of  semantics-based  program  analysis — so 
general  and  wide-ranging  that  the  theory  itself  intentionally  does  not  provide  explicit  snpport  for 
particnlar  langnage  featnres,  snch  as  data  strnctnres  and  fnnctions,  or  particnlar  applications, 
snch  as  alias  or  data-shape  analysis.  A  powerfnl  methodology  has  been  constrncted  aronnd 
this  theory  [CC79,  ConSl,  Con90,  CC92a,  CC92b,  CC92d,  CC92c,  CC94,  CC95],  inclnding  a 
wide  range  of  techniqnes  for  designing  nnmeric  lattices  [Kar76,  CH78,  Gra89,  Gra91a,  Gra91b]. 
Bnt  when  faced  with  a  specific  analysis  task  for  a  specific  programming  langnage,  the  analysis 
designer  is  left  largely  on  his  own  to  cope  with  the  overwhelming  generality  of  the  framework. 
With  a  deep  nnderstanding  and  skillfnl  nse  of  the  methodology,  the  resnlts  can  be  spectacnlar, 
snch  as  the  storeless  alias  analysis  of  Dentsch  [Den92,  Den94].  Bnt  after  20  years,  mnch  of  the 
staggering  potential  of  abstract  interpretation  still  remains  largely  nntapped. 


In  contrast,  onr  methodology  is  designed  aronnd  real  langnage  featnres,  snch  as  pointers, 
heap- allocated  data  strnctnres,  arrays,  assignment,  and  to  a  lesser  extent  first-class  fnnctions. 
Gonseqnently,  althongh  onr  framework  does  not  have  the  same  level  of  generality  as  abstract 
interpretation,  it  is  more  straightforward  to  apply  onr  tools  to  real  langnages  and  real  analysis 
tasks.  We  aim  to  strike  a  balance  between  analysis  theory  and  analysis  design.  One  of  onr 
goals  is  to  bring  some  of  the  power  of  semantics-based  analysis  techniqnes  closer  to  the  nser. 


To  accomplish  this,  we  have  taken  a  step  back  in  order  to  consider  the  task  of  program  anal¬ 
ysis  from  a  fresh  perspective.  This  new  perspective  has  nncovered  some  fnndamental  limitations 
in  the  cnrrent  methodology  of  program  analysis — limitations  that  are  manifest  in  real  analyses. 
By  largely  reworking  semantics-based  program  analysis  from  the  beginning,  this  dissertation 
provides  some  technical  answers  to  these  basic  limitations. 


1.1  Limitations  of  Single-step  Abstract  Interpretation 
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1.1  Limitations  of  Single-step  Abstract  Interpretation 


We  begin  with  an  anecdote.  Imagine  that  yon  are  asked  to  report  the  snm  to  two  decimal 
places  of  the  following  list  of  nnmbers: 


2.5548 

1.3475  X  IQ-i 
9.971 

3.802  X  10-^ 

2.388  X  102 
5.262 

Consider  these  two  different  approaches: 

•  Algorithm  A:  Compnte  the  exact  snm  of  all  six  nnmbers  and  then  ronnd  that  snm  to  two 
decimal  places. 

•  Algorithm  B:  Begin  with  0,  and  then  add  the  first  nnmber,  ronnd  to  two  decimal  places, 
add  the  second  nnmber,  ronnd  to  two  places,  add  the  third,  ronnd  again,  and  so  forth. 

Algorithm  A  is  the  procednre  that  natnrally  comes  to  mind  for  this  task,  and  of  conrse  it  retnrns 
the  correct  answer  of  256.73.  In  contrast.  Algorithm  B  reports  a  resnlt  of  256.71,  which  is  close 
bnt  not  correct.  Why  wonld  anyone  choose  this  second  approach?  One  can  imagine  that  the 
rednction  in  compntation  effort  is  worth  the  potential  for  accnmnlated  ronnding  error. 

In  fact,  these  two  algorithms  are  jnst  the  endpoints  of  a  spectrnm  of  possibilities.  For 
instance,  one  conld  first  compnte  the  precise  snm  of  adjacent  pairs  of  nnmbers  in  the  list, 
yielding  a  list  of  three  exact  partial  snms: 


2.5548 

+ 

1.3475  X  IQ-i 

=  2.68955 

9.971 

+ 

3.802  X  10-^ 

=  9.974802 

2.388  X  102 

+ 

5.262 

=  2.44062  X  10' 

Then  apply  Algorithm  B  to  this  list,  yielding  a  better  bnt  still  not  exact  256.72.  This  snggests 
a  general  approach  of  ronnding  only  every  so  often  dnring  the  accnmnlation  of  the  snm,  where 
Algorithm  A  is  the  extreme  that  ronnds  only  at  the  very  end,  while  Algorithm  B  is  the  other 
extreme  that  ronnds  after  every  single  nnmber  in  the  list. 


This  simple  disconrse  on  how  to  compnte  ronnded  snms  illnstrates  by  analogy  a  remarkably 
important  limitation  of  program  analyses.  As  a  very  simple  example,  consider  the  following 
program. 

while  n  >  0  do 

{ 

y  :=x  -  3; 
x:=y  +  5; 
n  :=n  —  1 


} 
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Suppose  that  at  any  point  during  an  execution  of  this  program,  the  variable  bindings  are 
described  by  an  environment 

p  G  Env  =  Var  — )■  Int 

Say  we  wish  to  determine  the  variable  bindings  at  the  termination  of  this  program,  given  an 
environment  po  describing  the  initial  bindings.  Because  this  program  always  terminates  (as 
long  as  n,  x,  and  y  are  bound  in  po),  we  can  in  fact  just  execute  the  program  and  return  the 
final  environment  as  the  answer. 

By  analogy,  an  entire  environment  p  corresponds  to  a  real  number,  and  the  execution  of  a 
single  step  of  the  program  (which  may  modify  the  environment)  corresponds  to  the  accumulated 
addition  of  one  number  in  the  list.^  Thus,  executing  the  program  corresponds  to  accumulating 
the  exact  sum  of  a  list  of  real  numbers  (starting  with  0).  The  length  of  the  list  is  the  total 
number  of  execution  steps,  which  in  this  case  is  always  finite,  but  may  be  quite  long. 

But  suppose  that  all  we  want  to  know  about  each  variable  at  the  end  of  execution  is 
information  about  its  sign,  expressed  as  one  of  the  following  properties  of  integers  ordered  by 
implication  (in  other  words,  sets  of  integers  ordered  by  inclusion). 

int 


nonpos  nonneg 

\  \ 

neg  zero  pos 

\  t 

none 

Given  an  environment  p,  one  can  abstract  p  by  a  sign  environment  p  such  that  (p  x)  is  the  sign 
(either  neg,  zero,  or  pos)  of  {px)  for  all  variables  x. 

p  E  Env  =  Var  — ^  Sign 

By  analogy,  p  corresponds  to  the  “rounding”  of  p.  Again,  we  can  just  execute  the  program 
and  “round”  the  final  environment  to  a  sign  environment.  This  is  analogous  to  Algorithm  A, 
and  it  will  always  return  the  strongest  properties. 

It  is  well  known,  however,  that  this  process  is  infeasible  in  general.  For  one,  the  program 
may  take  a  long  time  to  execute.  Even  worse,  we  may  not  know  the  exact  initial  environment 
po.  Finally,  some  programs  do  not  terminate,  and  even  if  we  do  know  po  beforehand,  it  is 
impossible  to  determine  effectively  if  the  execution  will  eventually  halt.  So  in  general  we  must 
settle  for  some  approximation  of  the  result. 

The  standard  approach  to  program  analysis  is  essentially  to  perform  Algorithm  B,  abstract¬ 
ing  at  each  step.  For  our  example  program,  the  first  three  steps  would  produce  the  following 

^This  analogy  has  the  disadvantage  that  a  real  number  corresponds  to  both  an  environment  and  a  single-step 
transformation  between  environments;  it  is  crucial  to  distinguish  these  two  very  different  concepts. 
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sign  environments  from  the  initial  sign  environment  shown  below: 

[x,y,n  I— )•  pos,  pos,  nonneg] 


while  n  >  0  do 

{ 

y  :=x  -  3; 
x:=y  +  5; 
n  :=n  —  1 


[x,y,n  !-)■  pos,  pos,  pos] 
[x,y,n  I— )■  pos,  int,  pos] 
[x,y,n  I— )■  int,  int,  pos] 


Before  the  first  assignment,  all  that  is  known  abont  x  is  that  it  is  positive,  bnt  the  analysis 
mnst  calcnlate  x  —  3  to  determine  the  valne  of  y.  The  exact  answer  is  the  set  of  all  positive 
integers  decremented  by  3,  which  is  the  set  {—2,  —1,  0, . . .},  bnt  the  abstraction  “ronnds”  that 
set  to  the  smallest  enclosing  element  in  the  sign  lattice,  which  is  int.  Now,  in  the  next  step,  all 
that  is  known  abont  y  is  that  it  is  an  integer,  and  so  y  +  5  is  the  set  of  all  integers  incremented 
by  5,  which  is  again  the  set  of  integers.  So  the  abstract  valne  of  x  in  the  next  step  is  int. 

However,  a  little  bit  of  thonght  reveals  that  x  is  actnally  gnaranteed  to  be  positive  after 
the  second  assignment.  The  reason  the  analysis  has  already  lost  this  information  is  becanse  of 
the  abstraction,  or  “ronnding  error”,  between  the  two  assignments.  If  the  set  {—2,  —1,0, . . .} 
had  not  been  abstracted  to  int,  then  its  increment  by  5  in  the  next  step  wonld  yield  the  set 
{3,4, . . .},  whose  abstraction  is  pos.  Thns,  there  are  two  ways  to  achieve  better  resnlts. 


1.  Do  not  abstract  between  the  first  and  second  assignments. 

2.  Abstract  after  every  step  as  nsnal,  bnt  beforehand  enrich  the  lattice  of  integer  proper¬ 
ties  with  an  element  corresponding  to  {— 2,  —  1,  0, . . .},  so  that  the  abstraction  of  this 
intermediate  property  loses  no  information. 


The  first  approach  seems  promising,  bnt  is  not  in  the  cnrrent  repertoire  of  program  analysis 
techniqnes.  Most  of  this  dissertation  develops  a  a  general  fonndation  that  one  may  nse  for  this 
approach;  we  will  retnrn  to  it  shortly. 

The  second  approach  seems  absnrd  from  a  practical  standpoint  and  tronblesome  from  a  theo¬ 
retical  standpoint.  It  clearly  does  not  generalize.  For  instance,  elements  snch  as  {— 2,  —  1,  0, . . .} 
are  clearly  ad  hoc  and  dependent  on  the  particnlar  rnn-time  behavior  of  a  program.  Probably 
many  new  elements  wonld  be  needed  for  a  reasonably  sized  program,  and  for  anything  more 
sophisticated  than  a  sign  analysis,  the  space  from  which  these  elements  may  be  chosen  becomes 
mnch  more  complex  and  rich.  Even  if  one  conld  isolate  a  small  set  of  nsefnl  specialized  elements 
with  which  to  enrich  the  property  lattice  for  a  given  program,  it  seems  difficnlt  to  determine 
which  properties  wonld  be  the  most  nsefnl  withont  actnally  rnnning  the  program  itself.  Never¬ 
theless,  there  are  examples  of  practical  program  analyses  that  essentially  nse  this  idea  in  limited 
capacity  for  the  lack  of  any  other  solntion;  we  give  an  example  at  the  end  of  this  section. 
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To  continue  with  the  analysis  of  this  program,  there  is  the  additional  complication  that  the 
execution  length  (corresponding  to  the  length  of  the  list  of  numbers)  may  be  unbounded,  and 
so  an  analysis  will  typically  use  some  “folding”  strategy,  usually  at  every  program  point,  and 
compute  the  solution  as  an  iterative  fixed-point  calculation.  For  example,  the  next  step  above 
is  to  calculate  pos  —  1  =  nonneg  for  n,  and  then  to  join  the  resulting  sign  environment  with  the 
old  environment  at  the  loop  entry,  weakening  the  properties  of  x  and  y  to  int.  The  analysis 
reaches  the  following  fixed  point  after  a  second  iteration  through  the  loop: 

[x,y,n  I— )■  int,  int,  nonneg]  \ 

I 

[x,y,n  I— )■  int,  int,  pos]  | 

I 

[x,y,n  I— )■  int,  int,  pos]  | 

I 

[x,y,n  I— )■  int,  int,  pos]  | 

I 

[x,y,n  I— )■  int,  int,  nonneg] 

[x,y,n  I— )■  int,  int,  zero] 

The  last  environment  is  the  answer.  But  the  most  precise  answer  (corresponding  to  the  “correct” 
rounded  sum)  is 

[x,  y,  n  I— ^  pos,  int,  zero]. 

As  we  have  suggested,  the  reason  that  the  analysis  reported  the  final  sign  of  x  as  int 
instead  of  pos  is  because  it  used  the  equivalent  of  Algorithm  B,  which  is  the  extreme  approach 
of  abstracting  at  every  step.  Algorithm  A  is  at  the  other  extreme,  which  as  we  have  explained 
is  uncomputable  for  program  analysis.  But  what  about  the  intermediate  approach  of  “rounding 
only  every  so  often”?  To  understand  how  that  applies  to  program  analysis,  consider  rewriting 
the  program  to  use  a  parallel  assignment: 

while  n  >  0  do 

{ 

x,y,n:=x  +  2,x  —  3,n  —  1 

} 

Now  apply  the  approach  of  Algorithm  B: 

[x,y,n  I— )•  pos,  int,  nonneg]  \ 
while  n  >  0  do  | 

{  [x,y,n  pos,  int,  pos]  | 

X,  y,  n  :=  X  +  2,  X  —  3,  n  —  1  | 

[x,y,n  I— )•  pos,  int,  nonneg] 


while  n  >  0  do 

{ 

y:=x  -  3; 
x:=y +  5; 
n:=n  —  1 

} 


} 


[x,y,n  I— )■  pos,  int,  zero] 
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This  returns  the  most  precise  answer  possible.  Note  how  this  approach  was  able  to  determine 
the  precise  result  for  x.  Before  the  assignment,  x  is  pos.  So,  x  +  2  is  the  set  {3,4, . . .},  which 
is  then  abstracted  to  pos. 

Technically,  we  are  still  abstracting  after  every  step  and  using  the  same  sign  analysis  to 
do  it,  but  by  rewriting  the  three  sequential  instructions  into  a  single  parallel  instruction,  we 
are  in  effect  abstracting  only  after  every  third  step.  Recall  that  in  our  analogy,  a  real  num¬ 
ber  corresponds  both  to  an  environment  (the  accumulated  result)  and  a  single-step  transition 
between  environments  (an  element  of  the  list).  Here,  the  transitions  are  done  by  assignment 
statements,  so  this  transformation  from  multiple  sequential  statements  to  a  single  parallel  state¬ 
ment  corresponds  to  adding  groups  of  adjacent  numbers  in  the  list  before  applying  Algorithm 
B. 


The  above  is  of  course  merely  a  toy  example.  But  it  is  not  hard  to  find  examples  in  real 
program  analyses  that  suffer  from  this  same  phenomenon  of  abstracting  after  every  step.  For 
instance,  Ghiya  and  Hendren  describe  in  [GH96]  a  shape  analysis  that  attempts  to  determine 
whether  data  structures  in  a  G  program  are  trees,  dags,  or  graphs.  Their  paper  describes  a 
difficulty  with  their  analysis: 

If  a  data  structure  temporarily  becomes  dag-like  or  cyclic  and  then  becomes  tree¬ 
like  again,  shape  analysis  cannot  detect  this,  and  continues  to  report  its  shape  as 
dag-like  or  cyclic.  The  benchmark  reverse  that  recursively  swaps  [the  children  of]  a 
binary  tree  represents  this  case. 

Although  shape  analysis  for  G  is  quite  a  bit  more  complex  than  a  sign  analysis  for  a  simple  arith¬ 
metic  while-loop  language,  it  turns  out  that  the  difficulty  that  Ghiya  and  Hendren  described 
is  precisely  the  same  phenomenon  that  caused  the  sign  analysis  above  to  fail  to  detect  that  x 
is  always  positive.  The  fundamental  reason  that  they  cannot  detect  those  temporary  changes 
of  shape  is  that  they  abstract  at  every  step.  In  their  case,  they  abstract  a  G  memory  state  by 
a  “direction  matrix”  and  an  “interference  matrix”;  and  whereas  our  problem  in  the  program 
above  was  that  our  lattice  of  sign  properties  could  not  precisely  express  the  set  {— 2,  —  1,  0, . . .} 
that  came  up  after  the  second  step,  their  problem  is  that  their  abstract  store  cannot  express 
many  of  the  possible  forms  of  non-tree  or  non-dag  shapes  that  may  arise  temporarily  during 
execution. 

This  is  a  problem  not  just  with  Ghiya  and  Hendren’s  shape  analysis.  At  the  same  con¬ 
ference,  Sagiv,  Reps,  and  Wilhelm  presented  a  shape  analysis  that  attempts  to  address  these 
issues  [SRW96].  They  point  out  that: 

The  third  and  fourth  common  list-manipulation  operations — splicing  a  new  element 
into  a  list  and  removing  an  element  from  a  list — can,  in  many  cases,  be  handled 
accurately  by  our  shape-analysis  algorithm,  even  if  shape-nodes  temporarily  become 
shared! 
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But  they,  too,  abstract  after  every  single  step.  In  order  to  achieve  good  results  for  some 
programs  that  temporarily  alter  data  shapes,  they  instead  chose  to  design  a  rather  unusual 
abstraction  of  a  memory  state  that  can  actually  express  certain  kinds  of  temporary  shape 
alterations  that  might  arise  in  common  programs.  In  spirit,  their  solution  is  item  2  on  page  7. 
As  we  suggested  there,  this  approach  does  not  generalize  very  well  and  is  necessarily  limited  at 
the  outset;  this  is  indeed  the  case  for  their  analysis.  Because  of  their  specialized  lattice  design, 
their  analysis  determines  very  little  information  about  any  program  that  allocates  at  least  one 
pointer  that  is  at  some  point  shared  (pointed  to  by  more  than  one  distinct  location  in  memory) 
and  not  itself  the  binding  of  any  variable.  Clearly,  this  eliminates  a  great  many  programs 
from  consideration — for  instance,  any  program  that  creates  a  doubly-linked  list  or  any  kind  of 
dag-like  structure.  In  contrast,  the  Ghiya/Hendren  analysis  is  not  nearly  so  limited. 

Our  claim  is  that  the  methodology  of  abstracting  at  every  step  is  a  ubiquitous  and  serious 
limitation  of  current  program-analysis  methodology.  To  understand  why,  we  will  revisit  abstract 
interpretation,  the  root  of  semantics-based  program  analysis,  in  Chapter  8.  This  dissertation 
will  provide  a  solution,  which  we  will  outline  in  Section  1.3. 

1.2  Overuse  of  Abstraction  and  Fixed-point  Computation 

Our  discussion  of  the  sign  analysis  in  the  previous  section  centered  around  how  to  deal  with  a 
single  loop  iteration.  We  only  touched  upon  the  “folding”  process  that  was  necessary  to  deal 
effectively  with  the  unbounded  execution  length  of  the  program.  The  issue  of  how  to  cope 
with  infinite  execution  sequences  is  of  primary  importance  in  program  analysis,  and  almost  all 
analyses  use  a  similar  technique  of  computing  a  fixed  point  over  an  abstract  semantic  domain 
(sign  environments  in  the  above  example). 

Our  claim  that  this  technique  is  rather  ubiquitous  and  yet  not  well  suited  for  many  analysis 
tasks.  The  cause  of  this  state  of  affairs  is,  perhaps  surprisingly,  strongly  related  to  the  cause 
of  the  problem  described  in  the  previous  section:  that  analyses  cannot  take  multiple  steps  of 
execution  between  abstraction.  Fortunately,  the  solutions  to  these  two  problems  are  closely 
related,  as  well,  and  in  this  dissertation  we  develop  the  foundations  for  both. 

In  Chapter  8  we  will  see  that  the  foundation  of  semantics-based  program  analysis  is  based 
on  an  observation  that  a  semantics  of  a  language  is  usually  expressed  using  a  fixed  point  whose 
iterative  calculation  corresponds  in  some  sense  to  the  execution  steps  of  the  program.  For 
instance,  consider  the  common  form  of  operational  semantics  as  a  transition  system,  in  which 
program  execution  is  modeled  by  the  single-step  transitions  from  machine  state  to  machine 
state.  This  kind  of  semantics  is  particularly  useful  for  program  analysis  because  it  expresses 
many  intensional  details  of  execution  that  might  be  of  interest  to  analyze;  one  might  say  that 
it  is  “close  to  the  iron”,  in  comparison  to  a  more  extensional  semantics  such  as  a  standard 
denotational  model  that  only  maps  program  input  to  program  output.  We  will  say  more  about 
this  in  Chapter  4. 

For  now,  we  are  not  so  much  concerned  with  the  appropriateness  of  a  particular  semantic 
model  for  the  purpose  of  program  analysis,  but  rather  we  wish  to  illustrate  that  semantic 
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models  of  programming  languages  typically  use  fixed  points  that  reflect  program  execution.  For 
example,  a  transition  system  of  a  particular  program  P  will  have  a  binary  transition  relation 

I — ^  C  State  X  State 

specifying  the  pairs  of  states  that  may  be  adjacent  in  an  execution  of  that  program.  Then  the 
semantics  Af  [P]  of  program  P  is  defined  as  an  unfolding  of  this  relation  into  a  set  of  unbounded 
sequences  (where  'ijj.'ijj  denotes  the  extension  of  state  sequence  ip  by  state  ip): 

Ip. Ip  G  Af[P]  Ip  I — ^  Ip' 
ip. Ip. Ip'  G  Af[P] 

If  this  rule  is  solved  inductively  from  a  base  set  of  initial  states,  its  iterative  solution  yields  all 
finite  execution  prefixes.^ 

One  can  rephrase  the  iterative  solution  of  the  above  rule  as  the  repeated  application  of  a 
function 

SIP}  G  P(State*)  ^  P(State*) 

that,  given  a  partial  solution  of  Af  [P],  applies  the  above  rule  once  to  enlarge  Af  [P]  by  a  single 
execution  step. 

A  program  analysis  based  on  this  transition-system  semantics  must  analyze  these  potentially 
unbounded  sequences.  For  instance,  suppose  that  in  our  sign  analysis  above,  a  state  comprises 
a  control  point  specifying  the  line  of  the  program  to  be  executed  next  and  an  environment 
specifying  the  current  variable  bindings. 

State  =  CtrlPoint  x  Env 


The  analysis  that  we  described  informally  above  can  now  be  formalized  as  an  iteration  of 

(a  o  (S[P]  o  7)  G  State  — )■  State 


until  a  fixed  point  is  reached,  where 


7  G  State  —5-  P (State*) 
a  G  P  (State*)  — )■  State 


and 


State  =  CtrlPoint  — )■  Env. 


Here,  a  member  of  State  is  a  table  of  abstract  environments  indexed  by  control  point,  just  as 
we  showed  next  to  the  program  in  the  examples  of  Section  1.1.  The  function  7,  given  such  a 

^There  are  similar  ways  to  express  the  infinite  executions  of  a  program  via  coinduction,  but  for  the  sake  of 
simplicity  we  leave  the  reader  to  [CC92b]  for  a  discussion.  We  do  note,  however,  that  the  use  of  coinduction  for 
program  analysis  is  powerful  technique,  especially  for  the  analysis  of  errors,  that  is  currently  not  well  appreciated. 
For  examples,  see  [Bou93a]. 
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table  \E',  describes  the  set  of  all  execution  sequences  whose  states  satisfy  the  properties  given  in 

The  function  a  abstracts  a  set  of  execution  sequences  by  a  table  giving  the  strongest  sign 
properties  of  the  states  in  those  sequences. 

In  the  analogy  of  Section  1.1  in  which  we  compared  program  execution  and  analysis  to  the 
accumulation  of  the  rounded  sum  of  a  list  of  numbers,  a  set  of  execution  sequences  corresponds 
to  an  “exact”  real  number,  and  a  member  of  State  corresponds  to  a  “rounded”  real  number. 
The  function  7  is  given  a  rounded  number  representing  the  accumulated  sum  at  some  point  in 
the  middle  of  the  list.  Conceptually,  7  “coerces”  this  number  into  an  exact  number  by  adding 
zeroes  onto  the  end.  Then  SfP}  corresponds  to  adding  the  next  number  in  the  list  to  this 
sum,  and  a  rounds  the  resulting  sum,  usually  losing  information.  The  program  analysis  repeats 
this  process  until  it  reaches  a  fixed  point  (which  does  not  have  a  clear  analog  in  our  list-sum 
anecdote). 

Almost  every  kind  of  program  analysis  is  based  on  a  similar  notion  of  fixed-point  calculation 
over  an  abstraction  of  the  properties  of  interest.  This  is  not  always  apparent,  because  many 
analysis  frameworks,  such  as  data-flow  analysis  [MJ81],  type  inference  [KMP84],  and  constraint- 
based  analysis  [Hei92,  AWL94],  are  phrased  in  terms  of  systems  of  equations  or  inference  rules. 
But  most  of  these  frameworks  reduce  to  a  fixed-point  calculation  whose  iterations  correspond 
in  some  sense  to  abstract  execution  steps  of  the  program.  Abstract  interpretation  is  a  fixed- 
point-based  theory  that  unifies  these  seemingly  disparate  approaches. 

In  Section  1.1  we  explained  that  this  methodology  of  abstracting  after  every  step  can  cause 
severe  precision  problems  with  the  analysis.  In  our  small  while-loop  example,  we  illustrated 
this  problem  by  rewriting  the  three  individual  assignments  in  the  loop  body  as  a  single  parallel 
assignment.  In  Chapter  8  we  will  go  further  into  that  topic,  but  for  now  we  suggest  that  a 
multi-step  program  analysis  might  amount  to  finding  the  fixed  point  of 

(a  o  5[P]  o  5[P]  o  5[P]  o  7)  G  State  — ^  State 

instead  of  the  above  function  that  takes  only  a  single  step  between  applications  of  the  abstraction 
function  a.  The  problem  is  that  there  is  no  general  methodology  to  develop  program  analyses 
that  have  this  kind  of  flexibility.  But  we  have  developed  such  a  methodology,  which  we  outline 
in  Section  1.3. 

Now  we  may  make  the  following  key  insight.  Once  one  has  a  methodology  to  perform  any 
number  of  steps  between  abstractions,  the  need  to  perform  the  abstractions  and  compute  the 
fixed  point  often  evaporates. 

For  instance,  shape  analyses  are  often  concerned  with  detecting  computations  that  are 
shape-preserving.  It  is  common  for  the  success  or  failure  of  a  shape  analysis  to  be  measured  by 
how  well  it  analyzes  routines  such  as  list-insert,  list-delete,  node  swapping,  and  so  forth.  For 
instance,  one  would  like  to  determine  that  a  routine  that  destructively  inserts  a  node  into  a 
linked  list  preserves  the  invariant  that  the  structure  upon  which  it  operates  has  the  shape  of  a 
list.  Routines  such  as  these  typically  take  more  than  one  instruction,  but  still  a  finite  number 
of  them.  Why  would  they  need  an  iterative  fixed-point  calculation  to  compute  their  shape¬ 
preserving  properties?  The  answer  is  that  they  do  not,  but  because  the  present  methodology  of 
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program  analysis  does  not  offer  any  way  to  combine  mnltiple  execntion  steps,  a  shape  analysis 
has  no  choice  bnt  to  perform  a  global  fixed  point  as  we  did  for  the  sign  properties  in  Section  1.1. 


1.3  An  Introduction  to  Our  Methodology 

This  dissertation  develops  a  fonndation  for  a  new  methodology  of  program  analysis  that  ad¬ 
dresses  the  problems  that  we  have  described  above.  This  fonndation  is  based  on  a  semantic 
methodology  of  programming  langnages  in  which  it  is  possible  to  compnte  a  simple  term  de¬ 
scribing  the  net  effect  of  any  given  finite  execntion  path. 

In  Section  1.2  we  snggested  that  an  operational  semantics  based  on  a  transition  system 
between  execntion  states  is  particnlarly  nsefnl  for  program  analysis.  For  the  example  program 
in  Section  1.1,  an  execntion  state  was  a  pair  of  a  control  point  and  an  environment.  In  general, 
environments  are  not  expressive  enongh  becanse  they  cannot  express  pointers  and  other  kinds 
of  mntable  data  strnctnres. 

In  order  to  address  a  wide  variety  of  langnages,  we  introdnce  the  notion  of  a  store.  A  store 
is  similar  to  an  environment  in  that  it  maps  variables  to  valnes,  bnt  it  also  maps  references  to 
valnes.  A  reference  is  a  pair  of  two  valnes;  the  reference  (v,  v')  is  written  v.v'  and  represents 
component  v'  of  data  strnctnre  v.  Actnally,  it  is  convenient  to  think  of  a  store  as  a  graph 
whose  nodes  are  valnes  and  whose  edges  are  labeled  by  valnes.  Then  v  is  the  root  node  of  some 
data  strnctnre  (record,  pointer,  array,  and  so  on),  and  its  ontgoing  edges  point  to  its  mntable 
snbcomponents,  labeled  by  their  names  v'  (field  names,  the  C  token,  integer  array  indices, 
and  so  forth).  An  l-value  is  an  object  that  may  be  dereferenced  in  a  store;  it  is  either  a  variable 
a:  or  a  reference  v.v' .  A  store  is  then  a  map  from  1-valnes  to  valnes. 

a  €;  Store  =  Lval  — ^  Val 

Lval  =  Var  U  (Val  X  Val) 

The  set  Val  of  valnes  is  left  nnspecified  becanse  different  langnages  will  need  different  valnes. 
We  consider  this  parameterized  notion  of  a  store,  however,  to  be  common  to  all  langnages. 

More  specifically,  the  techniqnes  in  this  dissertation  apply  to  any  langnage  in  which  the 
execntion(s)  of  a  program  can  be  expressed  as  a  transition  relation 

I — ^  C  State  X  State 


where 

State  =  Ctrl  Point  x  Store 

for  some  set  Ctrl  Point  of  static  control  points  and  some  set  Val  of  valnes. 

Usnally,  i — )■  is  defined  by  meta-rnles  that  specify  how  the  individnal  pieces  of  program 
syntax  indnce  transitions.  For  instance,  one  might  imagine  the  following  rnle  for  variable 
assignments. 


{x  :=  e;  t,a)  i — )■  (t,  cr[a:  i-)- £’[e]cr]) 
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Here,  e  is  a  basic  expression,  and  f’[e]cr  denotes  the  valne  to  which  e  evalnates  in  store  a.  The 
core  idea  of  onr  techniqne  is  to  replace  these  meta-rnles  with  computer-representable  composable 
descriptions. 

Onr  first  observation  is  the  isomorphism 

P(State  X  State)  ~  CtrlPoint  x  CtrlPoint  — )■  P(Store  x  Store). 

This  means  that  a  transition  relation  i — )■  is  eqnivalent  to  a  table  of  binary  relations  on  stores, 

C  C' 

indexed  by  pairs  of  control  points.  We  write  the  (C,  C)  entry  in  this  table  as  and  this 
relation  defines  the  possible  store  changes  in  a  single  step  from  C  to  C' .  Thns, 

{C,a)  ^  {C',a')  iff 
For  example,  one  can  rewrite  the  above  meta-rnle  as 

(x:=e;  t),t  .  .  CIT  H  1 

a  I — )■  a[x  c\ela\ 

or,  alternatively,  as  the  definition 

^  I — =  {{a,a')  \  a' =  a[x  £le\a]}. 

We  call  a  binary  relation  on  stores  a  transfer  relation.  A  transfer  relation  describes  a  way  in 

c  c 

which  a  store  evolves  dnring  execntion.  For  example,  is  a  transfer  relation  that  describes 
how  the  store  changes  in  a  single  step  from  C  to  C".  A  nice  property  of  transfer  relations  is 
that  one  may  compose  them  to  express  mnltiple  steps  of  execntion.  For  instance, 

C',C" 

is  a  transfer  relation  that  expresses  how  a  store  changes  in  an  execntion  that  begins  at  control 
point  C,  progresses  in  one  step  to  C' ,  and  then  progresses  in  the  next  step  to  C”.  Here,  the 
symbol  is  the  relation  composition  operator.  In  this  manner,  one  can  bnild  the  transfer 
relation  for  any  finite  control  path. 

Above,  we  said  that  onr  central  approach  is  to  replace  the  meta-rnles  of  the  transition  system 
with  compnter-representable  composable  relations:  compnter-representable  becanse  they  will 
be  directly  manipnlated  and  examined  by  a  program  analysis,  and  composable  becanse  we 
want  a  flexible  way  of  processing  mnltiple  execntion  steps  in  the  analysis  before  abstracting  the 
resnlt,  as  we  explained  in  the  example  of  Section  1.1  and  more  generally  in  Section  1.2. 

Let  ns  examine  this  more  closely.  As  we  explained  in  Section  1.2,  an  algorithm  for  analyzing 
program  P  works  by  iteratively  applying  an  abstract  step  fnnction 

(a  o  5[P]  o  7)  G  State  — )■  State 


1.3  An  Introduction  to  Our  Methodology 
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where  State  is  a  set  of  abstract  properties  of  state  sequences  (such  as  the  signs  of  the  numeric 
values  occurring  in  the  states),  and  the  application  of  this  function  to  ^  G  State  applies  the 
transition  relation  i — )■  to  extend  by  one  step  every  execution  sequence  consistent  with  ^  (as 
given  by  7)  and  abstracts  the  resulting  set  of  execution  sequences  with  a,  in  general  losing 
information  (i.e.,  weakening  the  property)  in  the  process. 

However,  this  function  cannot  be  implemented  in  these  three  stages.  It  is  not  possible  for  a 
program-analysis  algorithm  to  manipulate  the  probably  infinite  sets  of  states  or  state  sequences. 
Instead,  a  program  analysis  performs  this  three-stage  operation  in  a  monolithic  fashion,  where 
a  and  7  are  “baked  into”  the  transition  relation  1 — )■  that  forms  the  core  of  5[P]. 

For  example,  consider  again  our  example  meta-rule  for  variable-assignment  transitions: 

{x  :=  e;  t,a)  1 — )■  {t,a[x  Sleja]) 

The  program  analysis  designer  will  hand-design  an  algorithm  that  “abstractly”  performs  these 
transitions.  For  instance,  if  State  is  the  set  of  tables  of  sign  environments  indexed  by  control 
point,  as  given  in  Section  1.2,  then  a  straightforward  algorithm  to  compute  (a  o  SfPj  o  7)  will 
be  hard-wired  to  propagate  the  sign  property  of  expression  e  at  control  point  (x  :=  e;  t)  to 
variable  x  at  control  point  t  for  each  variable  assignment  in  P.  This  makes  intuitive  sense — 
the  algorithm  is  “abstractly  interpreting”  the  variable  assignments.  But  of  course  the  analysis 
designer  should  justify  these  intuitions  by  proving  that  the  algorithm  actually  implements  this 
function. 

Note  that: 

1.  To  apply  an  existing  analysis  to  a  different  language,  one  must  separately  hand-design  a 
new  algorithm  for  the  meta-rules  of  that  language.  This  is  an  engineering  disadvantage. 

2.  Because  the  abstraction  is  “baked  into”  the  analysis  algorithm,  there  is  no  way  to  perform 
multiple  execution  steps  abstracting  the  result.  This  is  a  more  serious  disadvantage  be¬ 
cause,  as  we  have  explained,  it  can  have  devastating  effects  on  the  quality  of  the  analysis. 

We  now  consider  a  different  methodology  to  address  these  issues.  Consider  the  meta-rule  shown 
above  as  the  single-step  transfer  relation 

{(a,a')  I  a'  =  (j[x  ^  f[e]a]}. 

Imagine  a  universal  computer-representable  language  of  these  single-step  transfer  relations;  for 
instance  the  above  relation  might  be  written  as 


X  e 


Then,  given  some  analysis  task  such  as  sign  analysis  or  shape  analysis,  one  could  implement 
a  universal  “back-end”  that  analyzes  this  language  of  transfer  relations.  Thus,  to  apply  the 
analysis  to  a  particular  programming  language,  one  merely  expresses  its  semantics  in  terms  of 
this  language  of  single-step  transfer  relations  instead  of  the  usual  meta-rule  formulation. 
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Imagine  further  that  this  computer-representable  language  of  transfer  relations  is  closed 
under  composition.  For  instance,  the  two  successive  variable  assignments 


{y-x-3-,  x--y+5-j^t),{x--y+5-,  t) 


{x--y+5;  t),t 


might  be  symbolically  composed  as  follows 


y  !-)■  a:  —  3 


X  ^  y  +  ^ 


y  X  —  3  ] 

X  y  +  5 

= 

x,y  X  —  3,  X  +  2 

to  yield  a  computer  representation  of  this  two-step  execution  segment.  Then,  because  the 
analysis  back-end  is  designed  to  analyze  any  member  of  the  language  of  transfer  relations,  it  has 
maximum  flexibility  to  perform  any  number  of  steps  before  abstracting.  We  demonstrated  the 
benefits  of  the  above  example  in  Section  1.1;  there,  we  magically  rewrote  the  source  program,  but 
now  we  are  moving  toward  a  universal  language-independent  methodology  of  transfer  relations. 


Of  course,  the  example  immediately  above  is  quite  simple,  as  it  does  not  involve  important 
language  features  such  as  arrays,  pointers,  mutable  data  structures,  or  conditionals.  The  fol¬ 
lowing  question  remains.  Is  there  a  computer-representable  language  of  transfer  relations  closed 
under  composition  that  is  both 


•  expressive  enough  to  handle  a  wide  variety  of  imperative  and  applicative  language  features, 
and 

•  simple  enough  to  be  the  target  of  a  wide  variety  of  important  program  analyses,  such  as 
alias,  shape,  and  value  analyses? 


The  answer  is  yes,  and  this  language  of  transfer  relations  is  largely  the  subject  of  Part  II.  This 
leads  to  the  following  general  methodology  of  program  analysis. 


Language 

Language 

Language 


3 


1 

2 

3 


Given  a  language  and  an  analysis  task,  one  first  describes  the  semantics  of  the  language 
in  terms  of  single-step  transfer  relations.  Then,  guided  by  a  strategy  to  suit  the  analysis  task 
and  particular  program  at  hand,  some  of  these  transfer  relations  are  composed  into  bigger 


1.4  Overview  of  the  Dissertation 
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steps,  similar  to  our  rewriting  of  the  example  program  in  Section  1.1.  Finally,  the  particular 
analysis  problem  uses  these  multi-step  transfer  relations  in  a  manner  appropriate  to  the  task. 
In  some  cases,  such  as  the  sign  analysis  of  Section  1.1  it  is  appropriate  to  apply  an  abstract 
interpretation  to  compute  an  abstract  fixed  point  of  these  transfer  relations.  In  other  cases, 
such  as  the  analysis  of  shape-preserving  properties  of  data-structure  maintenance  routines,  it 
may  be  more  appropriate  to  extract  the  property  of  interest  directly  from  the  multi-step  transfer 
relations,  without  designing  any  abstraction  or  performing  any  fixed-point  computation. 

Note  that  the  compositions  occur  at  the  language- independent  stage  of  transfer  relations, 
so  although  they  are  sometimes  analogous  to  rewriting  source-program  instructions,  as  in  Sec¬ 
tion  1.1,  that  is  not  always  the  case.  Also  note  that  the  analyses  are  now  defined  in  terms 
of  transfer  relations  instead  of  source  programs.  This  means  that  large  parts  of  an  analysis 
do  not  have  to  be  reimplemented  for  different  languages.  In  general,  however,  reengineering  is 
necessary  because  the  language  transfer  relations  will  be  parameterized  by  a  set  of  primitive 
operations,  and  those  may  change  from  source  language  to  source  language. 


1.4  Overview  of  the  Dissertation 

•  Part  II  presents  the  language  of  transfer  relations  and  the  basic  algorithms  to  compose 
them  and  manipulate  them,  and  explains  the  general  procedure  for  modeling  the  dynamic 
semantics  of  a  programming  language  with  transfer  relations. 

•  Part  III  shows  how  to  model  a  variety  of  imperative  and  applicative  language  features 
with  transfer  relations. 

•  Part  IV  expands  on  Section  1. 1  and  Section  1.2  by  sketching  some  ideas  for  how  to  design 
program  analyses  around  transfer  relations. 


Part  V  concludes. 


Part  II 

Foundations 


Chapter  2 


Stores  and  Transfer  Relations 


The  foundation  of  our  study  is  the  store.  A  store  is  a  model  of  an  instantaneous  state  of  the 
memory  during  program  execution.  As  a  program  executes,  it  will  at  various  points  examine 
variables,  data  structures,  stack  frames,  and  so  on,  and  it  will  at  other  points  change  the  values 
of  variables,  alter  the  components  of  data  structures,  allocate  new  data  structures,  create  new 
stack  frames,  and  so  forth.  All  of  these  operations  are  modeled  as  examinations  or  alterations 
of  the  store.  Intuitively,  there  is  one  global  store  that  evolves  during  program  execution.  But 
semantically,  this  “global  store”  is  modeled  as  a  trace  of  stores.  Every  time  the  program  takes 
another  step,  another  store  is  added  to  the  sequence.  If  that  execution  step  modified  the  store, 
then  the  modification  will  be  reflected  in  the  latest  store.  Otherwise,  the  latest  store  will  just 
be  a  copy  of  the  previous  one.  In  this  way,  the  program  leaves  a  trace  of  stores. 

Now  consider  the  task  of  analyzing  a  program’s  execution.  Ideally,  one  would  actually  let 
the  program  run,  leaving  its  trace  of  stores  behind.  Then,  when  the  program  is  done,  one  could 
go  back  to  that  trace  and  analyze  everything  that  happened  during  that  execution.  The  trace 
of  stores  is  the  entire  execution  history,  and  with  perfect  knowledge  of  that  history  all  questions 
about  the  program’s  run-time  behavior  could  be  answered.  This  is  sometimes  called  profiling. 

This  approach  to  program  analysis  has  some  serious  problems. 

•  The  execution  may  not  terminate,  thus  leaving  behind  an  infinite  trace  of  stores.  So  it  is 
impossible  in  general  to  run  a  program  and  then  perform  a  post-mortem  analysis  on  its 
trace. 

•  If  the  initial  store  (initial  data,  values  of  free  variables,  and  so  on)  is  unknown,  then  it 
doesn’t  make  sense  to  analyze  the  execution  trace  of  just  one  execution.  One  would  have 
to  analyze  one  execution  from  all  possible  initial  stores,  and  in  general  there  are  an  infinite 
number  of  them. 

•  Even  if  the  initial  store  is  fixed  and  the  program  terminates,  the  execution  may  have  a 
large  number  of  steps,  and  it  is  not  feasible  to  record  the  entire  store  at  every  step. 
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Program  analysis  is  largely  the  study  of  how  to  cope  with  these  issues.  The  usual  approach 
begins  with  the  observation  that  if  there  were  an  efficient  way  to  represent  in  a  computer  some 
interesting  but  infinite  sets  of  stores,  then  some  interesting  questions  about  a  program’s  run-time 
behavior  could  be  answered,  or  at  least  approximated,  automatically.  These  representations 
of  infinite  store  sets  can  be  thought  of  as  store  properties,  and  program  analysis  thus  becomes 
the  computation  of  properties  of  the  stores  that  can  arise  during  some  execution  from  a  store 
satisfying  some  initial  property.  For  instance,  in  Section  1.1  we  gave  an  example  of  an  analysis 
that  determines  a  sign  property  of  integer-valued  variables  at  every  syntactic  point  in  the 
program. 

Our  approach  is  to  begin  not  by  examining  the  stores  themselves,  but  how  stores  change 
over  the  course  of  the  execution  trace.  Suppose  that  a  program  analyzer  were  omnipotent  and 
could  examine  and  answer  any  questions  about  the  execution  traces,  even  infinite  ones,  from 
all  possible  initial  stores.  One  question  of  interest  might  involve  examining  pairs  of  stores  at 
different  points  along  the  trace,  to  see  what  the  differences  are  between  the  first  and  the  second. 
This  would  provide  information  about  what  happened  during  the  interval  of  execution  between 
those  points.  Now  reconsider  the  problems  listed  above: 


•  The  execution  may  never  terminate  and  thus  leave  behind  an  infinite  trace.  But  even  so, 
there  may  be  an  infinite  number  of  finite  intervals  during  the  execution  that  exhibit  the 
same  pattern  of  how  the  store  at  the  beginning  of  the  interval  relates  to  the  store  at  the 
end.  In  fact,  this  is  the  case  with  a  loop  in  the  program;  each  interval  corresponds  to  a 
single  iteration.  If  this  pattern  can  be  isolated,  then  it  is  not  necessary  to  examine  the 
entire  infinite  trace.  An  example  of  such  a  pattern  is  a  loop  invariant.  But  this  general 
concept  goes  beyond  loop  invariants.  For  instance,  one  may  relate  the  store  at  any  point 
during  a  loop  or  recursion  with  the  store  k  iterations  later  for  a  given  k. 

•  Even  if  the  initial  store  is  unknown,  there  may  be  a  commonality  in  the  change  between 
any  initial  store  and  the  store  at  some  later  point  in  the  trace.  This  is  similar  to  the 
situation  with  loops;  a  potentially  unbounded  number  of  trace  intervals  share  a  common 
net  effect  between  their  initial  and  final  stores.  Related  to  this  idea  is  the  use  of  weakest 
preconditions  to  describe  the  semantics  of  loops  [Dij76,  Wan77]. 

•  Asa  practical  matter,  even  if  the  initial  store  is  fixed  and  the  program  terminates,  isolating 
the  patterns  in  the  trace  provides  a  hope  of  making  the  analysis  feasible  in  practice. 


Such  a  pattern  or  commonality  in  the  way  one  store  evolves  into  another  is  simply  a  relation 
between  the  initial  and  final  stores.  We  call  these  transfer  relations.  It  turns  out  that  there  is 
a  simple  language  of  transfer  relations  that  covers  all  the  patterns  that  arise  during  program 
executions.  Also,  there  are  ways  to  compute  these  transfer  relations  and  use  them  to  reason 
about  the  executions.  In  this  chapter,  we  introduce  the  our  model  of  stores  and  give  the 
language  of  transfer  relations. 


2.1  Stores 
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2.1  Stores 

We  make  the  fundamental  assumption  that  during  program  execution,  any  instantaneous  state 
of  the  memory  can  be  modeled  by  a  store.  A  store  is  parameterized  by  the  following  disjoint 
sets. 

a:  G  Var  a  set  of  variables 

r;  G  Va  I  a  set  of  values 

A  store  is  then  a  function  from  l-values  to  values. 

a  G  Store  =  Lval  — )■  Val  stores 

w  G  Lval  =  Var  U  (Val  x  Val)  l-values 

We  parameterize  a  store  by  Val  because  we  would  like  to  develop  a  semantic  framework  that 
is  suitable  for  a  wide  variety  of  programming  languages  and  analysis  tasks.  However,  we  will 
require  that  Val  include  the  booleans  true  and  false  and  a  special  value  undef. 

true,  false,  undef  G  Val 

The  most  natural  notion  of  a  store  is  a  partial  function,  mapping  exactly  the  l-values  that  are 
defined  to  their  respective  values.  But  instead,  for  technical  reasons  in  Chapter  3,  we  require  a 
store  to  be  a  total  function,  mapping  all  of  the  “undefined”  l-values  to  the  distinguished  value 
undef.  Throughout  this  dissertation,  undef  refers  to  an  undefined  or  error  value.  In  Chapter  4, 
we  will  discuss  further  the  treatment  of  errors. 

The  “1”  in  1- value  means  “location” .  Intuitively,  an  1- value  represents  a  location  in  memory 
that  might  be  written  or  mutated  as  well  as  read.  There  are  two  kinds  of  l-values.  The  first 
kind  is  simply  a  variable.  The  second  kind  is  called  a  reference;  it  is  a  pair  of  a  value  r;  G  Val, 
representing  a  data  structure,  and  a  value  v'  G  Val,  representing  an  index  into  a  mutable 
component  of  that  data  structure.  The  1-value  {v,v')  G  Lval  is  written  v.v'. 

Intuitively,  a  value  represents  the  contents  of  a  single  mutable  memory  location — or  in  other 
words,  the  contents  of  an  1-value.  A  value  might  be  a  simple  object  such  as  an  integer  or  a 
boolean,  or  it  might  be  a  compound  object,  such  as  a  tuple  or  vector.  In  the  latter  case,  however, 
the  compound  object  must  be  immutable  because  it  represents  the  contents  of  a  single  mutable 
memory  location.  So,  for  instance,  one  should  not  model  a  (mutable)  Scheme  [ReC86]  cons  cell 
(1  .  2)  with  a  single  value,  but  rather  use  three  values:  one  for  the  cons  cell  itself,  one  for  1, 
and  one  for  2. 

A  store  is  then  a  function  from  l-values  to  values  that  describes  the  contents  of  the  memory. 
For  instance,  if  v  is  the  cons  cell  in  the  previous  paragraph,  the  store  would  map  the  references 
r;.car  and  r;.cdr  to  1  and  2,  respectively.  Intuitively,  a  program  execution  begins  in  some  initial 
store  (To  describing  the  initial  state  of  memory,  input  data,  and  so  forth,  and  then  continually 
modifies  the  memory  while  it  is  executing,  producing  a  sequence  of  evolving  stores  cri,  (T2,  (T3, . . . 
corresponding  to  the  steps  of  the  execution. 

We  stress  again  the  crucial  concept  that  l-values  represent  the  mutable  memory  locations. 
Some  programming  languages  include  data  structures  that  are  not  mutable — for  instance,  the 
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tuples,  records,  and  vectors  in  Standard  ML  [MTH90].  One  would  probably  model  these  objects 
simply  as  compound  values  rather  than  breaking  them  up  into  their  components  and  indexing 
those  components  by  separate  Lvalues  in  the  store. 

Example  1  Consider  the  C  programming  language.  A  value  r;  G  Val  might  correspond  to  any 
of  the  different  kinds  of  C  data  types: 

•  An  integer,  real  number,  or  character.  In  this  case,  v  would  be  that  value. 

•  A  pointer.  In  this  case,  v  would  be  a  token  representing  the  pointer  itself.  In  addition, 
there  would  be  a  value  *  G  Val,  and  the  l-value  v.*  would  represent  the  memory  location 
to  which  the  pointer  refers.  A  store  would  then  map  v.*  to  the  contents  of  the  pointer. 

•  A  struct.  In  this  case,  v  would  be  a  token  (pointer)  representing  the  root  of  the  struct.  In 
addition,  there  would  be  a  value  /  G  Val  for  each  field  name  f  in  the  structure,  and  the 
l-value  v.f  would  represent  the  memory  location  of  field  f  of  the  struct.  A  store  would 
then  map  v.f  to  the  contents  of  that  field  of  the  struct. 

•  An  array.  In  this  case,  v  would  be  a  token  (pointer)  identifying  the  array.  In  addition, 
every  non-negative  integer  n  would  be  in  Val,  and  the  l-value  v.n  would  represent  the 
memory  location  of  the  nth  array  element.  A  store  would  then  map  v.n  to  the  contents  of 
the  nth  element  of  the  array. 

The  above  example  illustrates  that  for  some  programming  languages,  the  set  Val  of  values 
might  include,  in  addition  to  the  base  values  of  the  language,  a  set  of  pointers  to  represent 
mutable  data  structures.  In  some  operational  semantics,  these  are  called  “locations”  or  “heap 
values”  [MFH95]  and  are  just  taken  from  an  arbitrary  infinite  set. 

Again,  we  stress  that  a  store  is  a  total  function.  This  is  not  intuitive,  because  at  any  time 
during  an  execution  of  a  program  in  any  reasonable  programming  language,  there  will  only  be 
a  finite  amount  of  data  actually  allocated  and  accessible  by  the  rest  of  the  execution.  But  this 
is  why  we  require  that  Val  include  the  distinguished  value  rmdef  to  represent  the  undefined 
value.  The  intended  use  of  stores  is  to  model  the  state  of  memory  during  an  execution  of  a 
computer  program.  If  an  l-value  w  G  Lval  is  undefined  in  the  memory  then  the  store  a  modeling 
that  memory  state  would  map  w  to  undef  (i.e.,  (aw)  =  undef).  Therefore,  the  fact  that  we 
require  stores  to  be  total  functions  is  not  a  limitation  of  expressiveness.  However,  for  minor 
technical  reasons  concerning  the  symbolic  composition  of  transfer  relations  in  Chapter  3,  it  will 
be  convenient  for  stores  to  be  total  functions. 


Stores  as  graphs 

It  is  sometimes  helpful  to  think  of  a  store  cr  as  a  graph  with  directed  labeled  edges.  The  set  of 
nodes  is 


Val U  {•} 
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where  •  is  a  distinguished  root  node  not  in  Val.  The  set  of  labeled  directed  edges  is 
{  •  ^  V  \  (7X=v}[j{v  ^  v"\  cr(v.v')  =  v"}. 

Note  the  following  properties  of  any  store  graph. 

•  Because  a  store  is  a  function  rather  than  a  general  relation,  no  two  outgoing  edges  of  the 
same  node  can  have  the  same  label. 

•  Node  •  has  no  incoming  edges,  and  all  its  outgoing  edges  are  labeled  with  variables. 

At  any  particular  time  during  program  execution,  all  the  1- values  that  are  undefined  in  the  store 
at  that  time  will  point  to  undef.  Because  of  this  choice  of  stores  as  total  functions,  a  store 
graph  will  in  general  be  infinite.  We  ignore  this  technical  detail,  as  our  perspective  of  stores  as 
graphs  is  solely  for  expository  purposes. 

Example  2  Again,  consider  the  C  programming  language.  Assume  the  set  Val  includes  C 
integers  and  C  characters. 

•  If  at  some  point  during  an  execution,  the  variable  x  G  Var  is  bound  to  a  pointer  to  a  location 
containing  the  integer  42,  then  the  store  a  at  that  point  of  execution  would  contain  the 
following  path  from  the  root  node: 

X  * 

• - - .-42 

Here,  G  Val  represents  the  pointer  itself. 

•  If  in  addition,  y  is  bound  to  a  struct  with  a  field  index,  which  is  the  integer  618,  and  with 
a  field  data,  which  points  to  a  two-element  array  whose  elements  are  the  chars  'A'  and 
't! ,  then  a  would  also  contain  the  following  paths  from  the  root  node: 


index 


■618 


■V2  Q 


.V4 


't! 


Here,  V2  G  Val  represents  the  struct,  vs  G  Val  represents  the  char-array  pointer,  and 
V4  G  Val  represents  the  char  array. 


•  If  in  addition,  z  is  bound  to  a  pointer  that  dereferences  to  itself,  then  a  would  also  contain 
the  following  subgraph: 


Here,  G  Val  represents  the  pointer  itself. 
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These  three  subgraphs  of  a  describe  precisely  the  data  that  is  reachable  from  variables  x,  j,  and 
z,  respectively,  at  this  point  of  execution. 

In  the  next  section,  we  stndy  ways  to  generate  a  valne  from  other  valnes  in  a  store.  This  is 
done  with  primitive  operations. 


2.2  Primitive  Operations 


Onr  framework  is  parameterized  by  a  set  Primop  of  primitive  operations.  Each  operation  p  G 
Primop  has  an  arity,  which  may  be  zero  or  more.  A  primitive  operation  describes  a  way  in 
which  zero  or  more  valnes  evalnate  to  a  single  valne.  The  phrase 

p{vi,  ...,Vn)  V 

means  that  the  n-ary  primitive  operation  p  G  Primop  applied  to  the  valnes  vi, . . .  ,Vn  G  Val 
in  store  a  G  Store  evalnates  to  valne  v  G  Val.  There  are  several  distinct  classes  of  primitive 
operations,  which  we  characterize  below. 

All  primitive  operations  mnst  satisfy  the  following  condition. 

Condition  1  (Definedness  of  primitives)  For  any  n-ary  primitive  operation  p  G  Primop, 
for  any  n  values  vi, ...  ,Vn  G  Val,  and  for  any  store  a  G  Store,  there  is  at  least  one  value  r;  G  Val 
such  that  p(vi, . . . ,  Vn)  V.  In  other  words, 

Vp,  -^1,  .  .  .  Vn,  o.  3v.  p{vi,  ...,Vn)  V 

This  condition  states  that  primitive  operations  mnst  be  defined  everywhere.  Conceptnally,  this 
reqnirement  is  analogons  to  the  reqnirement  that  a  store  is  defined  everywhere  (i.e.,  for  all 
1- valnes).  The  condition  is  reqnired  for  minor  technical  reasons  in  the  development  to  follow. 
However,  as  we  explained  abont  stores,  this  condition  does  not  limit  the  expressiveness  of  the 
framework  becanse  Val  inclndes  undef  representing  the  “nndefined  valne”. 

Indeed,  the  two  main  parameters  of  onr  framework — the  set  Val  of  valnes  and  the  set  Primop 
of  primitive  operations  with  associated  evalnation  relation — trnly  go  hand-in-hand.  This  will 
come  ont  in  Chapter  5  when  we  describe  the  design  of  a  programming  langnage  nsing  onr 
development. 

It  is  the  parameterization  of  the  framework  by  the  set  of  primitive  operations  that  makes 
this  methodology  particnlarly  flexible  and  nsefnl  for  a  variety  of  applications.  Yet,  it  is  not  the 
case  that  we  are  factoring  all  of  the  important  semantics  concepts  ont  along  with  the  primitive 
operations.  This  is  becanse  onr  concept  of  a  primitive  operation  is  a  computation  without 
.store  modification.  The  encapsnlation  of  snch  operations  as  the  main  parameter  of  the  analysis 
framework  tnrns  ont  to  be  qnite  nsefnl  and  powerfnl. 
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2.2.1  Deterministic  and  context-independent  primitive  operations 

It  will  be  convenient  to  introdnce  some  terms  for  some  different  classes  of  primitive  operations. 

Definition  1  (Deterministic  and  nondeterministic  primitive  operations)  A  primitive 
operation  p  G  Primop  is  said  to  be  deterministic  if 

p{vi,  ...,Vn)  V 

and 

p{vi,  ...,Vn)  V 

implies  that  v  =  v' .  Otherwise,  p  is  said  to  be  nondeterministic. 

A  typical  programming  langnage  will  need  only  deterministic  primitive  operations,  bnt  certain 
applications  of  the  framework  will  make  nse  of  nondeterministic  operations,  and  so  we  inclnde 
them  in  the  general  framework. 

Definition  2  (Context-independent  and  -dependent  operations)  A  primitive  operation 
is  said  to  be  context-independent  if  for  any  stores  a  and  o'  and  values  vi, . . .  ,Vn,v, 

p{vi,  ...,Vn)^aV  p{vi,  .  .  .  ,Vn)  ^cr'  V. 

In  other  words,  the  evaluation  of  p  does  not  depend  on  the  store.  In  this  case,  we  may  use  the 
abbreviated  form 

p{vi,  ...,Vn)^V 

for  evaluation.  Otherwise,  p  is  said  to  be  context-dependent. 

The  simplest  kinds  of  primitive  operations  are  deterministic  context-independent  operations. 
These  will  come  np  so  often  that  it  is  worth  introdncing  a  definition  jnst  for  them. 

Definition  3  (Simple  primitive  operations)  If  primitive  operation  p  is  both  deterministic 
and  context-independent,  then  it  is  said  to  be  simple. 


2.2.2  Examples  of  primitive  operations 

We  present  some  examples  of  each  kind  of  primitive  operations.  Althongh  they  are  jnst  examples 
for  the  moment,  some  of  them  will  play  a  major  role  in  the  development  to  follow.  This  section 
assnmes  that 

^(-^1,  ...,Vn)  undef 

nnless  otherwise  defined  below.  (Recall  that  rmdef  E  Val  is  represents  the  nndefined  valne.) 
We  also  assnme  that  Val  inclndes  the  integers. 
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Simple  operations 

Recall  that  simple  operations  are  operations  that  are  both  deterministic,  in  that  they  evalnate 
to  only  a  single  valne,  and  context-independent,  in  that  their  evalnation  does  not  depend  on 
the  store. 

Example  3  Each  value  v  G  Val  specifies  a  nullary  primitive  operation  v  G  Primop  that  evalu¬ 
ates  to  v: 

v{)  ^  V 

Example  4  Here  are  some  standard  arithmetic  primitive  operations  found  in  programming 
languages.  In  these  definitions,  n  and  n'  are  integer  values. 

+  {n,n')  ^  {n n') 

-(n,n')  ^  (n  —  n') 

^  {n  X  n') 

<{n,n')  ^  {n  <  n') 

>{n,n')  ^  {n  >  n') 

Example  5  Below  are  boolean  operations  for  conjunction,  equality,  and  inequality.  We  will 
use  the  first  two  (Sc  and  = )  internally  in  our  analysis  framework. 

&(true,r;)  ^  v 
&(r;,true)  ^  v 
&(false,r;)  ^  false 
&(r;,  false)  ^  false 

=  {v,  v')  ^  {v  =  v') 

<>{v,  v')  ^  {v  ^  v') 

Note  that  there  are  cases  in  which  these  operations  are  given  undef  and  evaluate  to  a  boolean 
value.  Intuitively,  these  are  error  cases  that  are  allowed  to  “run  wild”,  and  so  this  choice  is 
reasonable.  We  will  discuss  this  more  in  Chapter  f. 

Example  6  The  operation  if  implements  conditional  expressions. 

if  (true,  r;,  r;')  ^  v 
if  (false,  r;,  r;')  ^  v' 

Example  7  Example  1  demonstrated  the  need  for  pointer  values  to  correspond  to  the  roots  of 
mutable  data  structures.  Suppose  that  for  every  natural  number  n  there  is  a  pointer  (n)  G  Val, 
and  ptr  is  a  unary  primitive  operation  that  casts  an  integer  n  to  the  pointer  (n). 

ptr(n)  ^  (n) 

We  will  return  to  ptr  in  Chapter  5. 
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Nondeterministic  context-independent  operations 

These  operations  may  evaluate  to  more  than  one  value,  but  do  not  depend  upon  the  store. 
These  operations  are  similar  to  types,  and  this  similarity  provides  some  intuition  about  why  we 
include  the  possibility  of  nondeterministic  operations.  After  all,  one  of  the  major  applications 
of  program  analysis  is  to  infer  types  of  data  objects  and  expressions. 

Example  8  The  nullary  operation  pos  may  evaluate  to  any  positive  integer: 

pos()  ^  n  i/n  G  {1,  2, . . .} 

Example  9  The  nullary  operation  bool  may  evaluate  to  any  boolean: 

bool()  ^  true 
bool()  ^  false 

Deterministic  context-sensitive  operations 

These  operations  always  evaluate  to  a  single  value,  but  depend  on  the  store  in  which  the 
evaluation  occurs. 

Example  10  The  binary  operation  deref  dereferences  edges  in  a  store.  Given  values  v  and 
v' ,  it  evaluates  in  store  a  to  the  value  to  which  a  binds  the  l-value  (v.v'): 

deref  {v,v')  a{v.v  ) 

This  last  example  is  important,  which  we  will  see  in  the  next  section. 

The  next  example  hints  at  an  application  of  our  framework  to  the  analysis  of  the  shapes  of 
data  structures,  and  also  illustrates  the  point  that  our  notion  of  primitive  operations  does  not 
need  to  be  limited  to  operations  that  might  be  available  in  a  programming  language. 

Example  11  The  unary  operation  tree,  when  evaluated  in  store  a  with  value  v,  evaluates  to 
true  if  the  subgraph  of  a  rooted  at  v  (possibly  representing  the  root  of  some  data  structure)  and 
not  including  node  undef  is  a  tree  (in  graph-theoretic  terms).  Otherwise,  it  evaluates  to  false. 

Although  Primop  is  a  parameter  of  our  analysis  framework,  we  demand  that  it  include  the 
following  operations  described  above: 


true,  false,  &,=,  if  G  Primop 

The  first  two  simply  provide  a  way  of  denoting  booleans  as  expressions.  (Recall  that  the 
booleans  were  the  only  objects  other  than  undef  that  we  demand  to  be  members  of  Val.)  The 
second  three  are  used  internally  in  the  transfer-relation  composition  algorithm  in  Chapter  3. 
We  further  remark  that  any  nontrivial  application  of  our  methodology  will  need  deref  in  order 
to  build  expressions  that  can  perform  general  examinations  of  the  store. 
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2.3  Expressions  and  L-expressions 


Our  framework  is  based  upon  the  study  of  binary  relations  between  stores,  called  transfer 
relations,  that  describe  how  a  store  at  one  point  of  an  execution  evolves  into  a  store  at  some 
future  point  of  the  execution.  We  want  to  write  down  these  transfer  relations,  represent  them 
in  a  computer,  and  analyze  them  with  algorithms.  In  order  to  do  this,  it  turns  out  that  we 
will  need  two  languages  of  computer-representable  terms,  one  to  describe  the  nodes  in  a  store 
and  one  to  describe  the  edges  in  a  store.  Later,  we  will  use  these  two  languages  to  develop  a 
language  of  transfer  relations. 

Elements  of  the  first  language  are  called  expressions]  given  a  store  a  G  Store,  an  expression 
e  G  Exp  denotes  one  or  more  values  r;  G  Val.  If  e  denotes  v  in  cr,  then  we  say  that  “e  evaluates 
to  V  in  cr”.  The  same  expression  may  evaluate  to  different  values  in  different  stores. 

Elements  of  the  second  language  are  called  I- expressions]  given  a  store  a  G  Store,  the  1- 
expression  I  G  Lexp  denotes  one  or  more  Lvalues  w  G  Lval.  If  I  denotes  w  in  a,  then  we  say  that 
evaluates  to  w  in  cr”.  The  same  1-expression  may  evaluate  to  different  Lvalues  in  different 
stores. 

The  syntax  of  the  language  of  expressions  and  l-expressions  is  parameterized  by  a  set  Primop 
of  primitive  operations,  described  in  Section  2.2,  and  has  the  following  inductive  definition. 


e 

G 

Exp  ::=  x\p{ei,.. 

•  5  Cn) 

expressions 

1 

G 

Lexp  ::=  x  e.e' 

l-expressions 

P 

G 

Primop 

primitive  operations  (given 

X 

G 

Var 

variables  (given) 

There  are  two  types  of  expressions: 

•  A  variable  x  G  Var.  This  expression  evaluates  in  store  a  G  Store  to  the  (unique)  value 
r;  G  Val  such  that  • — is  an  edge  in  cr.  In  other  words,  x  evaluates  in  cr  to  (ax). 

•  An  application  of  an  n-ary  primitive  operation  p  G  Primop  to  n  expressions  ei, . . . ,  G 
Exp.  This  expression  evaluates  in  store  a  G  Store  to  value  r;  G  Val  if  expression  e*  evaluates 
in  store  cr  to  value  Vi,  for  i  G  {1, . . . ,  n},  and  p  applied  to  (r;i, . . . ,  Vn)  evaluates  in  store  cr 
to  V. 


The  phrase 

P 

denotes  the  nullary  primitive  application  p().  The  phrase 

ep  e 

denotes  the  binary  primitive  application  p{e,  e'). 

There  are  two  types  of  l-expressions: 
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•  A  variable  x  G  Var.  This  1-expression  evaluates  to  itself  in  any  store,  and  can  be  thought 
of  informally  as  the  dangling  edge  •  — 

•  A  reference  expression  e.e' .  This  1-expression  evaluates  in  store  cr  G  Store  to  the  1-value 
v.v'  G  Lval  if  e  evaluates  in  cr  to  value  r;  G  Val  and  e'  evaluates  in  store  cr  to  value  r;'  G  Val. 

v' 

This  1-value  can  be  thought  of  informally  as  the  dangling  edge  v - ►  . 

Formally,  the  interpretations  of  expressions  and  1-expressions  are  given  by  the  following  rela¬ 
tions. 


•  The  phrase  e\-(j  v  means  that  the  expression  e  evaluates  in  store  cr  to  value  v. 

•  The  phrase  I  her  w  means  that  the  1-expression  I  evaluates  in  store  a  to  1-value  w. 


Because  a  variable  is  both  an  expression  and  an  1-expression,  this  notation  may  seem  ambiguous. 
But  in  the  former  case,  the  right-hand  side  will  be  a  value,  and  in  the  latter  case  it  will  be  an 
1-value. 

The  following  rules  inductively  define  these  relations. 


X  ho-  (cr  x) 


ej  her  Vj  pjvi,  ...,Vn)^aV 

p(ei,  ...,en)\-aV 


expression  evaluation 


X  \-„  X  I — ^  1-expression  evaluation 

(e.e')  her  {v.v') 


The  following  lemma  states  that  every  expression  (1-expression)  evaluates  to  at  least  one 
value  (1- value). 

Lemma  1  (Definedness  of  expressions  and  1-expressions)  For  any  expression  e  G  Exp, 
for  any  store  a  G  Store,  there  is  at  least  one  value  v  G  Val  such  that  e  ho-  v.  For  any  l-expression 
I  G  Lexp,  for  any  store  a  G  Store,  there  is  at  least  one  l-value  w  G  Lval  such  that  I  \-(j  w.  In 
other  words: 

•  Ve  G  Exp,  cr  G  Store.  G  Val.  e  ho- 

•  V/  G  Lexp,  cr  G  Store.  3w  G  Lval.  I  ho-  w 

Proof:  Prom  Condition  1  on  primitive  operations  and  by  straightforward  induction  on  ex¬ 
pression  and  1-expression  evaluation.  □ 

It  is  important  to  distinguish  expressions  and  1-expressions  that  do  not  contain  any  appli¬ 
cations  of  nondeterministic  primitive  operations. 
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Definition  4  (Deterministic  expressions  and  1-expressions)  For  expression  e  G  Exp  (re¬ 
spectively,  l-expression  I  G  Lexp),  if  neither  e  nor  any  subexpression  of  e  (respectively,  if  no 
subexpression  ofl)  is  an  application  of  a  nondeterministic  primitive  operation,  then  we  say  that 
e  (respectively,  1)  is  deterministic.  The  phrase  determ(e)  (respectively,  determ(/))  denotes  this 
fact. 

The  following  lemma  states  that  deterministic  expressions  and  1-expressions  always  evaluate  to 
exactly  one  value  and  1-value,  respectively. 

Lemma  2  (Deterministic  expressions  and  1-expressions)  For  any  deterministic  expres¬ 
sion  e  G  Exp,  for  any  store  a  G  Store,  there  is  exactly  one  value  r;  G  Val  such  that  e  \-(^  v.  For 
any  deterministic  l-expression  I  €:  Lexp,  for  any  store  a  G  Store,  there  is  exactly  one  l-value 
w  G  Lval  such  that  I  \-(j  w.  In  other  words, 

•  determ(e)  ^  Vcr  G  Store.  3!r;  G  Val.  e\-(j  v 

•  determ(/)  ^  Vcr  G  Store.  3!u;  G  Lval.  I  ho-  w 

Proof:  Prom  Lemma  1,  from  Definition  1,  and  from  straightforward  induction  on  expression 
and  l-expression  evaluation.  □ 

If  all  primitive  operations  in  Primop  are  deterministic,  then  all  expressions  are  deterministic. 
But  even  if  there  are  nondeterministic  primitive  operations  in  Primop,  it  will  be  important  to 
distinguish  deterministic  expressions  for  the  symbolic  evaluation  of  certain  primitive  operations 
in  Chapter  3. 

At  first  it  may  seem  as  if  our  language  of  expressions  is  too  restrictive;  why  not  allow  arbi¬ 
trary  1-expressions  instead  of  just  variables.  The  reason  is  that  one  may  treat  the  l-expression 
e.e'  as  an  expression  by  using  the  deref  primitive  operation  that  we  introduced  in  the  previous 
section.  Consider  the  C  expression  ♦x.  On  the  left-hand  side  of  an  assignment  statement,  *x 
refers  to  a  memory  location,  or  an  1-value  in  our  terms.  But  on  the  right-hand  side,  it  refers  to 
the  contents  of  that  memory  location,  or  a  value  in  our  terms.  But  C  has  a  uniform  syntax  to 
handle  both  cases;  they  are  both  expressions.  In  contrast,  in  our  framework  the  term  on  the  left- 
hand  side  would  be  an  l-expression — namely,  x.*  (where  *  G  Val  as  in  Example  1) — whereas 
the  term  on  the  right-hand  side  would  be  an  expression — namely,  the  primitive  application 
deref  (x,  *). 

Therefore,  deref  is  a  rather  distinguished  primitive  operation  in  that  it  provides  the  ability 
to  examine  the  store  beyond  the  level  of  variables.  It  is  likely  that  one  will  almost  certainly  need 
it  in  any  analysis  application  for  any  language.  Therefore,  inspired  by  the  above  discussion,  we 
introduce  a  special  syntax  for  it.  The  term 


e.e 


will,  depending  on  the  context  in  which  it  appears,  refer  to  either  the  l-expression  e.e'  or  the 
primitive-application  expression  deref  (e,  e'). 


2.4  Simple  Transfer  Relations 
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2.4  Simple  Transfer  Relations 

Our  central  philosophy  is  that  it  is  advantageous  to  analyze  relations  between  two  stores.  These 
relations  are  called  transfer  relations.  The  idea  of  studying  transfer  relations  is  a  paradigm  shift 
from  most  analysis  frameworks,  as  the  focus  is  usually  on  reasoning  about  properties  of,  or  sets 
of,  individual  stores.  Yet  a  program  fragment  relates  initial  stores  to  final  stores,  and  so  it 
seems  intuitive  to  study  these  relations. 

2.4.1  Only  some  relations  are  natural 

At  first  blush,  it  seems  as  if  the  set  of  transfer  relations  is  the  set 

P (Store  X  Store) 

of  binary  relations  between  stores.  But  such  an  unrestricted  notion  of  transfer  relation  has  two 
related  disadvantages. 

•  For  analysis  purposes,  we  will  want  to  design  computer  representations  of  transfer  relations 
(ideally,  concise  representations),  and  it  is  impossible  to  do  so  for  general  binary  relations 
between  stores. 

•  Earlier,  we  gave  the  intuition  that  for  the  purpose  of  static  program  analysis,  a  transfer 
relation  corresponds  to  a  fragment  of  the  execution  of  some  program  in  some  programming 
language.  But  there  are  many  binary  relations  between  stores  that  would  never  come  from 
any  such  execution  fragment.  Indeed,  there  are  many  such  relations  that  are  not  even 
computable.  These  relations  are  unnatural  in  that  they  do  not  arise  during  program 
execution,  and  they  are  thus  of  no  use  for  reasoning  about  programs. 

The  key  is  to  identify  the  kinds  of  transfer  relations  that  might  actually  come  about  as  part  of  a 
computer  program’s  execution.  Fortunately,  there  is  a  class  of  such  relations  that  is  sufficiently 
expressive  and  yet  conducive  to  automatic  reasoning  and  analysis.  We  will  demonstrate  this  in 
later  chapters,  where  we  model  real  programming  languages  with  transfer  relations  and  then 
use  those  relations  to  analyze  source  programs. 

Abstractly,  apart  from  any  particular  language  or  program,  one  basic  kind  of  transfer  re¬ 
lation  is  a  relation  that  updates  a  store  graph  by  assigning  or  changing  the  node  to  which  an 
edge  in  the  graph  points.  These  relations  can  describe  dynamic  actions  in  a  programming  lan¬ 
guage  that  modify  memory  in  some  way.  We  already  have  both  the  language  of  expressions  to 
denote  nodes  and  the  language  of  1-expressions  to  denote  edges.  An  assignment  relation  is  then 
described  by  an  1-expression,  denoting  an  edge  to  be  assigned  or  reassigned,  and  an  expression, 
denoting  a  node  to  which  the  edge  must  point. 

Another  basic  kind  of  transfer  relation  is  a  relation  that  simply  filters  through  stores  that 
satisfy  a  certain  property  and  rejects  the  stores  that  do  not.  This  is  the  most  basic  kind 
of  conditional  operation,  and  as  such  will  be  necessary  to  express  the  dynamic  behavior  of 
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most  programming  languages.  Again,  we  can  use  the  language  of  expressions  to  specify  these 
properties,  and  so  a  filter  relation  is  described  by  an  expression. 

One  can  then  build  bigger  relations  from  these  basic  relations. 


2.4.2  Building  natural  transfer  relations 

We  wish  to  describe  a  set 

TrRel  C  P(Store  x  Store) 

of  natural  transfer  relations.  By  natural,  we  mean  informally  that  it  is  reasonable  to  imagine 
that  the  transfer  relation  in  question  might  correspond  to  a  fragment  of  an  execution  of  some 
program  in  some  programming  language.  In  other  words,  suppose  that  at  a  certain  point  in  the 
middle  of  an  execution  of  some  program  in  some  language,  store  cri  describes  the  state  of  the 
memory  at  that  point.  As  the  execution  continues  from  that  point,  it  will  produce  a  sequence 
of  evolving  stores  (72,  0-3, . . .  corresponding  to  the  steps  of  the  execution.  Then  for  each  n  >  0, 
there  there  should  be  some  transfer  relation  A  G  TrRel  that  relates  store  cri  to  store  cr„;  in  other 
words, 

CTi  Acr„. 

The  idea  is  to  keep  the  set  TrRel  as  small  as  possible,  but  still  large  enough  that  one  could 
model  the  operational  semantics  of  realistic  programming  languages  using  only  these  relations. 
Fortunately,  this  is  quite  easy  to  do  in  a  rather  satisfactory  and  intuitive  manner. 

We  will  define  the  set  TrRel  inductively. 

•  There  are  four  types  of  basic  relations  in  TrRel: 

—  The  empty  relation  0. 

—  The  identity  relation  •,  which  relates  only  between  identical  stores.  Formally: 


a  •  a 

—  The  assignment  relation  I  1— )■  e  where  I  G  Lexp  and  e  G  Exp.  If  1-expression  I 
evaluates  to  l-value  w;  in  cr  and  expression  e  evaluates  to  value  v  in  cr,  then  I  1— )■  e 
updates  store  a  by  assigning  w  to  v.  Formally: 

I  \-(j  w  e\-(j  V 
a  I  e  {a[w  1— )■  v]) 

Here,  a[w  i-)-  v]  is  the  store  that  maps  wtov  and  is  otherwise  identical  to  a.  Formally, 
it  is  defined  as  follows: 


{a[w  I— )■  v])  w' 


V  it  w  =  w' 
a  w'  otherwise 


Note  that  if  -ideterm(/)  (i.e.,  if  a  nondeterministic  primitive  operation  appears  in  1) 
then  I  may  evaluate  to  more  than  one  1- value  (and  similarly  for  e),  and  so  /  1— )■  e 
can  relate  a  store  on  the  left  to  several  different  stores  on  the  right. 
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—  The  filter  relation  el  where  e  G  Exp,  which  relates  a  store  a  to  itself  only  if  expres¬ 
sion  e  evalnates  to  the  valne  true  in  a.  Formally: 

e  hn-  true 


o 


=? 


o 


If  A,  A'  G  TrRel,  then  their  relational  composition  A;  A'  (alternatively,  A'o  A)  is  in  TrRel. 
We  nse  the  standard  definition  of  relational  composition: 

ct^ct'  o'  Mo" 
a  (A;  A')  a' 


Note  that  the  identity  relation  •  can  be  defined  as  the  filter  relation  true?  ,  where,  as 


described  in  Section  2.2.2,  true  denotes  the  nnllary  application  of  the  primitive  operation 
defined  by  true  G  Val.  Similarly,  the  empty  relation  0  can  be  defined  as  the  filter  relation 
false?  .  Bnt  it  is  more  convenient  to  have  distingnished  representations  for  each  of  these  two 


special  cases. 


2.4.3  Examples  of  transfer  relations 

As  described  at  the  beginning  of  this  chapter,  onr  goal  for  the  sake  of  generality  is  to  develop 
a  framework  for  expressing  data  and  operations  on  data  in  a  langnage- independent  manner. 
Nevertheless,  it  is  illnminating  at  this  point  to  look  at  some  examples  of  transfer  relations  and 
consider  how  they  might  arise  dnring  the  execntion  of  a  compnter  program. 


Example  12  The  transfer  relation 


X  I— )•  2  I;  I  X  I— )•  X  +  1 


is  equal  to  (in  other  words,  precisely  the  same  relation  as)  the  transfer  relation 


X  I— )■  3 


that  assigns  variable  x  to  be  3.  In  other  words,  it  changes  any  store  by  redirecting  the  edge 

X 

• - 

to 


Here,  +  G  Primop,  and  the  integers  are  included  in  Val  and  as  nullary  operations  in  Primop. 


Example  13  The  transfer  relation 

(x<0)?|; 

relates  any  store  in  which  x  is  bound  to  a  negative  number  to  a  store  in  which  x  to  be  the 
absolute  value  of  that  number  and  is  otherwise  equivalent.  Furthermore,  it  relates  every  .store 
in  which  x  is  not  bound  to  a  negative  number  to  no  .store  at  all. 


X  1-^  0  -  X 
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Example  14  Imagine  a  use  of  stores  and  transfer  relations  to  model  a  language  with  heap- 
allocated  data  structures.  One  way  to  model  data  allocation  is  to  maintain  a  convention  that 
the  semantic  variable  H  holds  the  index  of  the  next  available  pointer.  Then  one  can  use  ptr 
from  Example  1  to  generate  that  pointer.  By  convention,  we  write 

ptr(e) 


as 


Then  the  transfer  relation 

(e) 

X  (H) 

;  x.car  i-)-  y  ;  x.cdr  i-)-  z  ;  H  i->  H  +  1 

allocates  a  new  record  that  has  two  fields — car,  which  is  assigned  to  y’s  value  (which  may  be 
undef ),  and  cdr,  which  is  assigned  to  z ’s  value  (which  may  be  undef ) — and  assigns  x  to  be  this 
record. 


Example  15  The  transfer  relation 


X  I— )■  x.tl 


X  I— )■  x.tl 


X  I— )■  x.tl 


is  equal  to  the  transfer  relation 

I  X  I— )■  x.tl.tl.tl 


which  in  a  store  that  includes  the  subgraph 


■-yi 


ti 


-V2  ■ 


tl 


-V^ 


tl 


V4 


assigns  x  to  be  V4,  producing  a  store  that  includes  the  subgraph 


Vi 


tl 


V2 


tl 


V4 


Although  we  have  written  the  paths  as  linear  pictures,  it  is  not  necessarily  the  case  that  vi, 
V2,  Vs,  and  V4  are  distinct.  Therefore,  the  paths  shown  above  may  actually  include  cycles.  For 
instance,  if  vi  =  vs  then  the  original  subgraph  would  actually  look  like 

X  tl 

• - ^Vizi^V2 

and  the  transfer  relation  would  modify  this  to 


Vi 


tl 

'TT' 


:V2- 


Example  16  The  transfer  relation 


X.*  I— )■  X 


2.5  The  Difficulty  of  Composition 
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assigns  field  *  of  the  value  bound  to  x  to  point  to  that  value  itself,  thus  creating  a  circular  data 
structure  in  the  store: 


The  C  statement 


*x  =  x; 


performs  a  performs  a  similar  operation.  Before  the  statement,  x  is  bound  to  a  memory  address 
V.  After  the  statement,  x  is  still  bound  to  the  memory  address  v,  but  now  that  memory  address 
holds  V  itself.  Our  graphical  representation  above  directly  reflects  this  memory  state. 


Example  17  The  transfer  relation 

x.y  I— )■  z 

transforms  a  store  a  as  follows.  Suppose  x  is  bound  to  value  v,  y  is  bound  to  value  v' ,  and  z  is 
bound  to  value  v"  in  a.  Then  the  outgoing  edge  of  v  labeled  with  v'  is  redirected  to  point  to  v" . 
If  v'  is  an  integer  then  this  is  equivalent  to  the  C  statement 

x[y]  =  z; 


but  if  v'  is  not  an  integer  then  this  transfer  relation  has  no  correspondence  in  C. 
Example  18  The  transfer  relation 


x.car  I— >•  y 


z.car  I— w 


acts  as  follows.  For  those  stores  in  which  x  and  z  are  bound  to  different  values,  it  assigns  field 
car  of  X ’s  value  to  be  y ’s  value  and  field  car  of  z ’s  value  to  be  w ’s  value.  For  those  stores  in 
which  X  and  z  are  bound  to  the  same  value,  it  assigns  field  car  of  that  value  to  be  w ’s  value. 


2.5  The  Difficulty  of  Composition 

Above,  we  defined  a  basic  relation  to  be  either  the  empty  relation  0,  the  identity  relation 
assignment  relation  /  i— )■  e  ,  or  a  filter  relation  I 


an 


e: 


We  defined  any  transfer  relation  that  is  not 
a  basic  relation  to  be  a  finite  composition  of  basic  relations.  Note  that  in  Examples  12  and  15, 
the  composition  of  more  than  one  basic  relation  is  equal  to  another  basic  transfer  relation,  but 
in  those  cases  the  single  basic  transfer  relation  such  as 


a:  I— )■  3 


exposes  information  that  is  not  so  clear  in  the  composition  itself. 

Sometimes,  though,  the  composition  of  more  than  one  basic  relation  is  not  a  basic  relation, 
as  Examples  13,  14,  and  18  demonstrate. 
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It  would  be  convenient  if  there  were  a  reasonably  compact  and  clear  representation  scheme 
for  all  transfer  relations.  The  explicit  composition  of  100  basic  relations  is  not  only  cumbersome 
by  any  reasonable  measure,  but  also  quite  likely  shed  little  insight  on  what  exactly  the  transfer 
relation  does.  For  instance,  the  description  of  the  transfer  relation 


a:. car  i— )■  y  ;  |  z.cax  i— )•  w  \ 


in  Example  18  is  not  at  all  obvious  from  that  representation  itself.  The  exercise  of  decoding 
the  net  effect  of  the  transfer  relation 


is  much  more  difficult  yet.  It  may  seem  as  if  this  last  example  is  designed  to  destructively 
insert  the  first  element  of  a  linked-list  y  into  the  second  position  of  a  linked-list  x.  However, 
that  behavior  only  occurs  under  certain  initial  aliasing  conditions.  It  is  not  an  easy  exercise  to 
determine  the  possible  behaviors  of  this  example  under  different  initial  aliasing  conditions. 

Fortunately,  there  is  a  reasonably  simple  representation  scheme  that  covers  all  transfer 
relations  in  TrRel.  This  scheme  actually  computes  the  effect  of  any  composition,  rather  than 
leaving  the  composition  operation  explicit  as  written  directly  above,  and  hence  reveals  quite 
clearly  the  effect  of  any  transfer  relation.  But  the  four  kinds  of  basic  relations  in  TrRel  are  not 
quite  sufficient  to  express  these  compositions  syntactically.  Therefore,  we  have  to  extend  the 
language. 


2.6  The  Full  Language  of  Transfer  Relations 

The  language  TR  of  transfer  relations  is  defined  inductively  as  follows. 

A  G  TR  ::  = 

5  G  ATR  ::  = 

We  have  already  defined  0  (the  empty  relation).  Assignment  relations  are  generalized  to  parallel 
assignments  ATR  C  TR,  defined  as  follows. 

hfj  hfj  ^  7“  J  ‘^i  7“  ‘^j 

a  li,...,lnr^  ei,...,en  {o[wi  I-7-  -yi]  .  .  .  [Wn  H-  Vn]) 

A  crucial  fact  about  assignment  relations  is  that  the  assignment  only  takes  place  if  all  the 
l-values  to  be  assigned  are  actually  distinct.  One  can  therefore  look  at  an  assignment  relation 
with  n  assignments  and  know  that  whenever  it  relates  an  initial  store  to  a  final  store,  it  performs 
exactly  n  distinct  assignments. 

Filter  relations  are  generalized  to  conditional  relations,  defined  as  follows. 

e  her  true  a  A  a'  e  her  false  a  A' a' 

CF  el  A  \  A'  cf'  cf  el  A  \  A'  cf' 


^  I  e?  A  I  A' 


Wi  ■  ■  ■  iln  '  t  ,  .  .  .  ,  Cj, 


2.6  The  Full  Language  of  Transfer  Relations 


39 


We  adopt  the  following  syntactic  abbreviations. 

•  The  empty  parallel  assignment  (i.e.,  where  n  =  0)  is  simply  the  identity  relation  •  and 
may  be  written  as  snch. 

•  The  conditional  relation  e?  A  |  0  may  be  abbreviated  as  e?  A  . 

•  The  conditional  relation  e?  0  |  A  may  be  abbreviated  as  A  . 

In  order  to  avoid  confnsion,  we  introdnce  different  notations  for  syntactic  and  semantic 
eqnivalence  of  transfer  relations.  If  A  and  A'  are  both  the  same  syntactic  term,  or  in  other 
words  the  same  element  of  the  langnage  TR,  then  we  write  A  =  A  and  say  that  they  are 
syntactically  equivalent.  If  A  and  A'  denote  the  same  relation,  then  we  write  A  =  A'  and 
say  that  they  are  semantically  equivalent.  Note  that  syntactic  eqnivalence  obvionsly  implies 
semantic  eqnivalence,  bnt  semantic  eqnivalence  does  not  necessarily  imply  syntactic  eqnivalence 
becanse  in  this  langnage  there  may  be  more  than  one  way  to  write  the  same  relation.  For 
instance, 

true?  A  I  A'  =  A, 

bnt 

true?  A  I  A'  7^  A. 

In  this  sense,  the  langnage  TR  is  not  fnlly  abstract  [HP79,  Mnl87].  If  it  were  fnlly  abstract, 
then  we  wonld  have  a  decidable  way  of  testing  semantic  eqnality  of  transfer  relations,  bnt  this 
will  not  be  so  important  for  the  applications  of  onr  analysis  framework. 

A  major  property  of  transfer  relations  is  that  if  all  the  primitive  operations  are  deterministic, 
then  all  transfer  relations  are  actnally  partial  fnnctions.  Formally,  we  have  the  following  lemma. 

Lemma  3  (Deterministic  transfer  relations)  If  Primop  is  deterministic,  then  for  every 
transfer  relation  A  G  TR  and  .store  a  G  Store,  there  is  at  most  one  store  o'  G  Store  such 
that  a  A  o' . 

Proof:  Given  cr,  we  proceed  by  strnctnral  indnction  on  A. 

•  0:  By  definition,  there  is  no  o'  snch  that  a^a'. 

•  e?  A  I  A'  :  Becanse  Primop  is  deterministic,  we  know  from  Lemma  2  that  there  exists 
exactly  one  v  snch  that  e  her  v.  There  are  three  cases. 

—  V  =  true:  Then  by  the  definition  of  conditional  relations,  o  el  A\  A'  o'  only  if 
o  Ao',  and  by  indnction  there  is  at  most  one  o'. 

—  V  =  false:  Analogons,  with  A'. 

—  Otherwise:  Then  by  the  definition  of  conditional  relations,  there  is  no  o'  snch  that 
o  el  A  I  A'  o' . 
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/i, ei, .. . 


Because  Primop  is  deterministic,  we  know  from  Lemma  2  that 
for  i  G  {1, . . .  ,n},  there  exists  exactly  one  Wi  such  that  li  her  Wi  and  exactly  one  Vi  such 
that  e*  her  Vi.  Then  by  the  definition  of  assignment  relations,  a  /i, . . . ,  i— )■  ei, . . . ,  e,j  a' 
only  if  tci, . . . ,  are  all  distinct  and  a'  =  cr[u;i  i— )■  . . .  [wn  i— t  Vn]-  There  is  at  most 
one  such  a'. 


□  This  theorem  has  the  following  corollary,  which  is  not  useful  on  its  own,  but  which  we  will 
use  in  some  of  the  proofs  in  Chapter  3. 

Corollary  1  If  Primop  is  deterministic,  then  for  any  two  transfer  relations  A,  A'  G  TR,  the 
following  two  statements  are  equivalent: 

•  A  =  A' 

•  (cr  A  cr'  cr  A'  cr')  A  (a  A'  o'  3cr".  a  A  o") 

The  main  result  about  this  language  of  transfer  functions  is  that  under  certain  conditions  it 
is  closed  under  composition.  In  other  words,  there  exists  a  total  syntactic  composition  function 

©  G  TR  X  TR  TR 

that,  given  two  transfer  relations  in  the  language  TR,  builds  a  third  transfer  relation  in  TR  that 
is  semantically  equivalent  to  their  composition.  In  other  words, 

(A  ©  A')  =  (A;  A') 

for  any  two  transfer  relations  A,  A'  G  TR.  Under  weaker  circumstances,  A  ©  A'  is  not  guaran¬ 
teed  to  be  semantically  equivalent  to  A;  A',  but  is  guaranteed  to  be  a  superset  of  A;  A'.  But  we 
will  see  that  the  conditions  for  semantic  equivalence  will  be  met  by  any  application  of  transfer 
relations  to  model  the  dynamic  semantics  of  programming  language. 

In  fact,  the  ©  function  is  effectively  computable,  and  so  we  will  call  it  the  composition 
algorithm.  If  there  were  a  combinator  in  the  language  of  transfer  relations  that  represented 
composition,  then  the  composition  algorithm  would  be  trivial.  In  other  words,  if  we  extend  the 
language  by 

A  ::=  ...  I  A  ©  A' 

and  define  A  ©  A'  to  be  the  relation  A;  A',  then  the  composition  algorithm  could  be  simply 
the  ©  combinator.  But,  as  we  explained  above,  our  goal  for  the  practical  purpose  of  program 
analysis  is  to  avoid  a  syntactic  representation  of  composition.  We  present  the  algorithm  in  the 
next  chapter. 
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Composing  Transfer  Relations 


The  composition  algorithm  for  transfer  relations  is  based  on  a  kind  of  symbolic  evaluation.  We 
will  present  the  the  algorithm  in  several  stages. 

•  Symbolic  evaluation  of  primitive  operations.  This  part  of  course  depends  on  the  particular 
choice  of  the  set  Primop  of  primitive  operations  and  their  evaluation  semantics.  The  choice 
of  Primop  and  design  of  the  symbolic  evaluation  algorithms  for  those  operations  forms  the 
core  of  any  program  analysis  designed  with  our  framework. 

•  Symbolic  evaluation  of  expressions  and  1-expressions.  These  algorithms  are  defined  rela¬ 
tive  to  Primop  and  its  associated  symbolic  evaluation  algorithms. 

•  Symbolic  evaluation  of  conditional  relations.  This  part  is  also  a  parameter  to  the  compo¬ 
sition  algorithm. 

•  Symbolic  evaluation  of  assignment. 

•  Symbolic  evaluation  of  transfer-relation  composition. 


3.1  Symbolic  Evaluation  of  Primitive  Operations 

The  first  step  of  any  application  of  our  analysis  methodology  is  the  choice  of  the  set  Val  of 
values  and  the  set  Primop  of  primitive  operations,  which  will  largely  depend  on  the  language  to 
be  analyzed.^  The  second  step  is  the  design  of  an  algorithm  to  symbolically  evaluate  primitive 
application  expressions.  The  heart  of  our  methodology  is  in  this  symbolic  evaluation,  and  the 
power  of  our  approach  comes  from  the  flexible  notion  of  a  primitive  operation  as  potentially 
any  computation  that  does  not  modify  the  store,  including  both  non-deterministic  primitive 
operations  and  context-sensitive  primitive  operations.  Yet,  the  fact  that  primitive  operations 

^Recall  that  we  require  Val  to  include  the  boolean  constants  and  undef,  and  we  require  Primop  to  include  the 
boolean  constants  and  the  boolean  operations  ft,  =,  and  if. 
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are  constrained  not  to  modify  the  store  ensnres  that  their  symbolic  evalnation  is  never  too 
complicated. 

The  framework  constrncted  in  this  section  is  for  ns  what  the  notion  of  the  Galois  connection 
and  associated  fixed-point  theorem  is  for  abstract  interpretation  [CC77].  In  abstract  interpre¬ 
tation,  one  first  designs  a  fixed-point-based  semantics  for  the  langnage  to  be  analyzed  and  then 
designs  an  abstraction  of  the  semantic  domain.  Then,  for  the  most  part,  the  framework  of 
abstract  interpretation  provides  the  rest — in  particnlar,  a  fnnctional  for  the  abstract  domain 
indnced  by  the  semantics  whose  fixed  point  is  gnaranteed  to  satisfy  a  certain  relation  with  the 
semantics  of  the  langnage. 

In  onr  approach,  one  first  chooses  a  set  of  primitive  operations  and  then  designs  symbolic 
evalnation  algorithms  for  them.  The  varions  algorithms  in  this  chapter  provide  mnch  of  the 
remaining  work. 

In  Chapter  2,  we  gave  many  examples  of  nsefnl  primitive  operations.  Some  of  them  were 
common  and  familiar,  snch  as  the  constants  and  basic  operations  over  booleans  and  integers. 
Others  were  rather  distingnished,  snch  as  deref  and  pos.  We  will  retnrn  to  many  of  these  in 
this  section. 

3.1.1  A  first  cut:  symbolic  evaluation  of  simple  primitive  operations 

A  term  like  “symbolic  evalnation”  wonld  tend  to  imply  that  we  need  an  algorithm 

P  E  Primop  — )■  Exp*  — )■  Exp 

that  satisfies  the  property  that 

(Pp  (ei,  .  .  .  ,  en))  her  V  p(ei,  .  .  .  ,  e„)  her  V. 

In  other  words,  P,  given  an  n-ary  primitive  operation  p  and  n  expressions  ei, . . . ,  e„,  retnrns 
an  expression  that  is  semantically  eqnivalent  to  the  expression  p(ei, . . .  ,e„).  The  degenerate 
fnnetion 

Pp(ei,...,e„)  =  p(ei, . . .  ,e„) 

obvionsly  works,  bnt  might  not  prodnee  optimal  resnlts.  For  instance,  that  fnnetion  retnrns 

P  +  (42,24)  =42  +  24, 

bnt  it  is  obvions  that  in  this  case  P  conld  have  actnally  performed  the  addition,  thns  prodneing 
the  smaller  expression 

P  +  (42,24)  =  66 

where,  again,  66  is  technically  an  application  of  the  nnllary  primitive  operation  66.  This  is  called 
“constant  folding”  in  the  compiler  literatnre  [ASU86].  Even  beyond  this,  one  conld  imagine 
that  P  might  try  to  nse  a  calcnlns  of  arithmetic  transformations,  perhaps  yielding  resnlts  snch 
as 

P  +  (42,a:  +  24)  =  x  +  66. 
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In  fact,  however,  this  notion  of  symbolic  evalnation  makes  sense  only  for  primitive  operations 
that  are  simple,  as  defined  in  Definition  3.  In  the  next  section,  we  snbsnme  the  notion  of 
symbolic  evalnation  in  this  section  with  a  more  general  notion  that  covers  all  kinds  of  primitive 
operations. 

3.1.2  Generalized  symbolic  evaluation  of  primitive  operations 

In  this  section  we  present  the  general  notion  of  symbolic  evalnation  of  primitive  operations  that 
works  for  all  kinds  of  operations,  inclnding  non- deterministic  operations,  defined  in  Definition  1, 
and  context-sensitive  operations,  defined  in  Definition  2. 

The  symbolic  evalnation  of  a  set  Primop  of  primitive  operations  is  compnted  by  a  fnnction 

P  E  Primop  — )■  Exp*  — ^  ATR  — >■  Exp 
that,  loosely  speaking,  given 

•  an  n-ary  primitive  operation  p  E  Primop, 

•  n  expressions  ei, . . . ,  E  Exp,  and 

•  an  assignment  relation  5  E  ATR  (recall  the  definition  of  ATR  from  page  38), 

prodnces  an  expression  e  =  (Pp  (ei, . . . ,  e„)  S)  that  satisfies  the  following  property.  If  (ei, . . . ,  e„) 
evalnate  to  valnes  (r;i, . . .  ,Vn)  in  some  store  a,  and  if  p  applied  to  these  valnes  evalnates  to  a 
valne  r;  in  a  store  after  the  assignment  S  is  applied  to  a,  then  e  mnst  evalnate  to  v  in  a. 

The  following  definition  formalizes  this  correctness  condition. 

Definition  5  (Symbolic  evaluation  of  primitive  operations)  If  whenever  a  5  a' , 

'“cr  ip{vi,  ...,Vn)  V  =>  (Pp(ei,  .  .  .  ,  C^)  h)  her  v) , 

then  primitive  operation  p  E  Primop  is  said  to  be  symbolically  evaluated  by  P.  If  every  p  E 
Primop  is  symbolically  evaluated  by  P  then  P  is  said  to  be  a  symbolic  evaluation. 

It  is  worth  noting  that  the  second  implication  in  the  above  proposition  is  not  an  iff.  This  means 
that  if  there  are  nondeterministic  primitive  operations  in  Primop  then  P  is  allowed  to  produce 
a  nondeterministic  expression  (which,  recall,  is  an  expression  using  nondeterministic  primitive 
operations)  that  may  evaluate  to  “extra”  values.  The  reason  we  allow  this  is  that  the  only 
time  that  we  will  need  the  reverse  implication  is  when  there  are  no  nondeterministic  primitive 
operations  in  the  first  place,  and  in  that  case  the  reverse  implication  comes  for  free.  This  is 
expressed  by  the  following  lemma. 
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Lemma  4  (Symbolic  evaluation  of  deterministic  operations)  If  all  primitive  operations 
p  G  Primop  are  deterministic,  then  the  second  implication  relationship  above  is  strengthened  to 
an  iff  relationship. 

Proof:  Because  all  primitive  operations  are  deterministic,  we  know  that  for  any  p,  vi, . . .  ,Vn, 
ei, . . . ,  Cji,  S,  a,  and  a': 

•  There  is  exactly  one  v  such  that  p(vi, . . . ,  Vn) 

•  Prom  Lemma  2,  there  is  exactly  one  v  such  that  (Pp  (ei, . . . ,  e„)  S)  her  v. 

Therefore,  an  implication  relationship  between  them  is  equivalent  to  an  iff  relationship.  □ 


Special  case:  context-independent  primitive  operations 

Because  context-independent  operations  do  not  use  the  store,  P  can  safely  ignore  its  assignment- 
relation  argument  S  for  any  such  operations.  This  is  described  by  the  following  lemma. 

Lemma  5  (Symbolic  evaluation  of  context-independent  operations)  If  p  ^  Primop  is 

context- independent  and 


p{ei,  .  .  .  ,  Bn)  V  ^  {P  p  {ei,  .  .  .  ,  Bn)  S)  \-a  V 
then  p  is  symbolically  evaluated  by  P. 

Proof:  Straightforward.  □ 

This  is  similar  to  the  statement  of  correctness  that  we  suggested  above  for  the  simpler  notion 
of  symbolic  evaluation.  The  above  lemma  suggests  the  requirement,  without  any  loss  of  gener¬ 
ality,  that  for  any  context-independent  primitive  operation  p,  the  expression  (Pp  (ei, . . . ,  e„)  S) 
must  not  depend  on  the  particular  value  of  S.  Because  many  primitive  operations  are  context- 
independent,  this  suggests  a  simpler  notation  for  their  symbolic  evaluation  in  which  S  does  not 
appear. 

Definition  6  (Notation  for  context-independent  operations)  Given  P,  if  p  is  context- 
independent  then 

p(ei, . . .  ,e„) 

denotes  the  unique  expression  Pp  (ei, . . . ,  e„) 

For  binary  primitive  operations,  we  sometimes  abbreviate  p(e,  e')  with  e  p  eh 

3.1.3  Examples 


In  this  section,  we  give  examples  of  some  of  the  primitive  operations  given  in  Section  2.2.2. 
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Context-independent  primitive  operations 

There  is  no  reason  for  any  context-independent  nnllary  primitive  operation  to  symbolically 
evalnate  to  anything  other  than  itself.  So,  for  instance,  we  have: 

v{)  =  V  for  r;  G  Val 

pos()  =  pos 

bool()  =  bool  (see  Example  9) 

An  application  of  a  nnary  operation  snch  as  ptr  from  Example  7  also  typically  symbolically 
evalnates  to  itself.  Recall  that  we  write  (e)  for  ptr(e).  Then: 

p'tr(e)  =  (e) 

These  are  clearly  correct  symbolic  evalnations.  In  fact,  for  all  context-independent  primitive 
operations,  the  definition 

p(ei,...,e„)  =  p(ei,...,e„) 

is  trivially  a  symbolic  evalnation  by  Lemma  5  becanse 

p{ei,  ...,en)\-aV  p(ei, . . . ,  e„)  her  -y 


simply  by  definition. 

Bnt  there  may  be  some  room  for  simplification.  For  instance,  for  binary  integer  opera¬ 
tions  (inclnding  comparison  operations),  we  conld  define  their  symbolic  evalnation  to  perform 
constant-folding  when  possible,  and  otherwise  defanlt  to  the  above  eqnation.  Here,  n  and  n' 
denote  integers  (nnllary  primitive  applications). 


n 

+ 

n'  = 

=  n 

+ 

n‘ 

n 

- 

n'  = 

=  n 

- 

n‘ 

n 

? 

n'  = 

=  n 

X 

n‘ 

n 

< 

n'  = 

=  n 

< 

n‘ 

n 

> 

n'  = 

=  n 

> 

n‘ 

e 

p 

e’  -- 

=  e 

p 

e 

otherwise  (where  p  G  {+,  -,  *,  <,  >}) 


Now,  we  have  to  prove  that  those  constant-folding  clanses  are  symbolic  evalnations.  We  prove 
the  case  for  +: 


(n  +  n')  ho-  V 
V  =  n  +  n' 

{n  +  n')  ho-  V 
(n  +  n')  ho-  V 


definition  of  h,  valne  primitives,  and  + 
definition  of  valne  primitives 
definition  of  + 


The  constant-folding  rnles  for  the  other  operations  are  analogons. 
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Just  like  the  standard  arithmetic  operations  above,  the  symbolic  evaluation  of  &  performs 
the  constant-folding  taken  straight  from  its  definition. 

true  &  e  =  e 

e  &  true  =  e 

false  &  e  =  false 

e  &  false  =  false 

eke'  =  e  k  e'  otherwise 

Again,  the  last  line  is  trivially  a  symbolic  evaluation,  and  so  we  need  to  prove  the  other  four 
lines.  The  proofs  are  similar  to  the  example  shown  above  for  +. 

•  true  k  e  (e  k  true  analogous): 

(true  k  e)  V 

e  her  V  definition  of  h,  true,  and  k 

(true  k  e)  her  V  definition  of  k 

•  false  &  e  (e  &  false  analogous): 

(false  ke)h^v 

V  =  false  definition  of  h,  false,  and  k 

(false  k  e)  her  V  definition  of  k 

One  can  go  even  further  for  the  symbolic  evaluation  of  =.  Here  is  a  somewhat  subtle 
definition  that  depends  on  whether  the  argument  expressions  are  deterministic,  as  defined  in 
Definition  4: 

e  =  e  =  true  if  determ(e) 

e  =  e  =  bool  if  -ideterm(e)  (see  Example  9) 

V  =  v'  =  false  if  V  ^  v' 

e  =  e'  =  e  =  e'  otherwise 

If  all  primitive  operations  p  G  Primop  are  deterministic  then  all  expressions  are  deterministic. 


and  so  the  definition  simplifies  to: 

e  =  e 

=  true 

v=v' 

=  false 

if  V  ^  v' 

e  =  e' 

=  e  =  e' 

otherwise 

But  we  prove  the  more  general  formulation.  Again,  we  need  to  show  that  the  first  three  lines 
yield  a  symbolic  evaluation. 

•  If  determ (e): 

(e  =  e)ha  V 

3v',  v" .  [e  h(j  v'  /\e  h^  v"  A  ={v' ,  v")  ^  r;]  definition  of  h 
3v'.  [e  ho-  v'  A  ={v',v')  ^  r;]  because  determ(e) 

V  =  true  definition  of  = 

true  hfj  V  definition  of  true 

(e  =  e)h(jV  definition  of  = 
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•  If  -ideterm(e): 

(e  =  e)\-a  V 

V  =  true  V  =  f  alse  definition  of  h  and  = 
bool  her  V  definition  of  bool 

(e  =  e)\-(jV  definition  of  = 

•  If  7^  v': 

(v  =  v')  her  V 

=(v,v')  ^  V  definition  of  valne  primitives  and  h 
r;  =  false  definition  of  = 

false  her  definition  of  false 

(v  =  v')  ho-  V  definition  of  = 

For  now,  we  present  a  very  simple  symbolic  evalnation  of  if: 

if  (true,  e,  e')  =  e 

if  (false,  e,  e')  =  e' 

if(e,e',e")  =  if(e,e',e")  otherwise 

The  proof  is  straightforward  and  similar  to  the  proof  shown  above  for  &.  However,  it  is  often 
important  to  do  a  better  job  of  simplifying  conditional  expressions,  and  we  will  give  a  more 
sophisticated  algorithm  in  Chapter  9. 


Context-dependent  primitive  operations 

The  symbolic  evalnation  of  context-dependent  primitive  operations  is  mnch  more  complicated 
than  the  symbolic  evalnation  of  context-independent  operations,  snch  as  the  ones  shown  above. 
As  we  explained  above,  for  a  context-independent  operation  p  one  can  always  fall  back  on 

p(ei,...,e„)  =  p(ei,...,e„) 

which  is  trivially  a  symbolic  evalnation.  Bnt  the  symbolic  evalnation  of  context-dependent  op¬ 
erations  needs  to  compnte  the  effect  of  an  arbitrary  parallel  assignment  on  the  operation.  So  far, 
the  only  context-dependent  operations  we  have  seen  are  deref  and  tree.  We  introdneed  tree 
mainly  for  illnstration,  bnt  on  the  other  hand  deref  is  a  crncial  operation  for  modeling  and 
analyzing  programing  langnages  becanse,  as  we  explained  in  Chapter  2,  it  is  the  only  way  to  con- 
strnct  an  expression  that  examines  the  components  of  mntable  data  strnctnres.  It  has  a  rather 
complex  symbolic  evalnation  becanse  of  aliasing  possibilities.  Let  5  =  /i,  ...,/„  i-)-  e'/, ...,  e"  . 
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if((e  =  ei)  &  (e'  =  e{), 

p" 

if((e  =  62)  &  (e'  =  62), 

e2^ 

if( 

if((e  =  Ek)  &  (e'  =  e';,), 

4', 

e.e')...))) 

where  the  indices  in  the  set 

{(ei.ei,e"), . . . ,  (e^.e';,,  e'^')}  =  {{k,ei)  |  *  G  {1, . . . ,  n}  A  0  Var}. 

are  ordered  arbitrarily.  Now  we  show  that  this  is  a  symbolic  evalnation.  Snppose  that  a  5  a', 
e  ho-  r;i,  and  e'  ho-  V2.  By  the  definition  of  deref,  if  deref  (r;i,  r;2)  v  then  v  =  a'(vi.V2). 
We  mnst  show  that 

(P  deref  (e,  e)  ho-  v. 

Recall  that  we  overload  the  phrase  e.e'  to  mean  not  only  the  1-expression  e.e'  bnt  also  the 
expression  deref  (e,  e').  First,  we  prove  two  resnlts.  The  first  one  shows  how  to  move  down  the 
true  arm  of  the  *th  branch  in  the  case  that  there  is  a  possible  alias  between  e.e'  and  the  1-valne 


ej.e'  assigned  by  5. 

(ej.e-)  her  ('^1.-^2) 

Si  her  -^1  A  e-  her  V2 

definition  of  h 

(e  =  Si)  ho-  true  A  {e'  =  e^)  ho-  true 

definition  of  = 

(e  =  Si)  ho-  true  A  (e'  =  e^)  ho-  true 

=  symbolically  evalnates  = 

((e  =  Si)  &  (e'  =  e-))  h^  true 

definition  of  & 

((e  =  Si)  &  (e'  =  e-))  h^  true 

&  symbolically  evalnates  & 

e"  ho-  ^  if  ((e  =  e*)  &  (e'  =  e^),  e",  e'") 

ho  V 

definition  of  if 

e"  ho-  ^  if  ((e  =  e*)  &  {e'  =  e^),  e",  e'") 

ho  V 

if  symbolically  evalnates  if 

The  second  resnlt  shows  how  to  move  down  the  false 

arm  of  the  ith  branch  in  the  case 

there  is  possibly  no  alias  between  e.e'  and  the  1-valne  e* 

.e^  assigned  by  5. 

(ei.e'i)  ho  (v'i.v'2)  A  (-yi  :fv'^\lv2^  v'^) 

=;> 

Ei  ho  v'l  A  e-  ho  v'2  A  (-yi  V2^  v'2) 

definition  of  h 

=;> 

(e  =  Ei)  ho  false  V  (e'  =  e^)  ho  false 

definition  of  = 

=;> 

(e  =  Ei)  ho  false  V  (e'  =  e')  ho  false 

=  symbolically  evalnates  = 

=;> 

((e  =  Ei)  &  {e'  =  e^))  ho  false 

definition  of  & 

=;> 

((e  =  Ei)  &  {e'  =  e'))  ho  false 

&  symbolically  evalnates  & 

=;> 

e'"  ho  r;  ^  if  ((e  =  e*)  &  {e'  =  e'),  e",  e'") 

ho  V 

definition  of  & 

=;> 

e'"  ho  r;  ^  if  ((e  =  e*)  &  (e'  =  e'),  e",  e'") 

ho  V 

if  symbolically  evalnates  if 

Now,  there  are  two  possibilities. 


Then 


P  deref 


e,  e 


= 


3.2  Symbolic  Evaluation  of  Expressions  and  L-expressions 


49 


•  No  overwrite:  a{vi.V2)  =  v  and  5  did  not  assign  to  vi.V2.  Then  it  mnst  be  the  case  that 

for  i  G  ,k},  (e^.e')  ho-  {vi.v'2)  where  7^  v'^  or  V2  7^  Therefore,  by  indnction 

on  the  definition  of  (Pderef)  nsing  the  second  resnlt  above  to  move  down  the  k  false 
branches, 

e.e  ho-  ^  (P  deref  (e,  e)  5)  ho-  v. 

And  becanse  in  this  case  a{vi.V2)  =  v,  by  the  definition  of  deref  we  indeed  have  that 
e.e'  ho-  V. 

•  Overwrite:  5  assigned  V1.V2  to  be  r;.  Then  it  mnst  be  the  case  that  for  some  i  G  {1, . . .  ,  A:}, 
(ej.e^)  ho-  ('yi.'y2))  and  for  all  j  <  i,  (ej.e'j)  ho-  {v'i.v'2)  where  7^  v'l  or  V2  7^  V2.  Therefore, 
by  indnction  on  the  definition  of  (Pderef)  nsing  the  second  resnlt  above  to  move  down 
i  —  1  false  branches  and  then  the  first  resnlt  above  to  move  down  the  next  true  branch, 

e'-  ho-  ^  (P  deref  (e,  e')  ho-  v. 

And  becanse  in  this  case  5  assigned  V1.V2  to  be  r;,  it  mnst  be  the  case  that  e"  h^  v. 


3.2  Symbolic  Evaluation  of  Expressions  and  L-expressions 

This  section  describes  the  following  algorithms,  which  are  defined  relative  to  a  set  Primop  of 
primitive  operations  with  associated  symbolic  evalnation  algorithm  P. 

E  G  Exp  — )■  TR  — )■  Exp 
L  G  Lexp  — )■  TR  — )•  Lexp 

Loosely  speaking,  the  E  algorithm,  given  an  expression  e  and  a  transfer  relation  A,  compntes 
an  expression  e'  snch  that  if  e'  evalnates  to  a  valne  v  in  some  store  a  then  e  evalnates  to  r;  in  a 
store  to  which  A  transfers  from  a  (i.e.,  a  store  a'  snch  that  a  A  a').  In  other  words,  e'  expresses 
the  combined  effects  of  e  and  A.  The  L  algorithm  is  similar,  bnt  works  on  l-expressions  rather 
than  expressions.  Intnitively,  given  an  1-expression  I  and  a  transfer  relation  A,  L  compntes  an 
expression  I'  snch  that  if  I'  evalnates  to  an  1-valne  w  before  A  then  I  evalnates  to  u;  in  a  store 
to  which  A  transfers  from  a.  We  distingnish  two  levels  of  correctness  of  E  and  L,  given  by  the 
following  definition. 

Definition  7  (Symbolic  evaluation  of  expressions  and  l-expressions)  We  introduce  the 
following  terms  to  describe  correctness  properties  of  E  and  L. 

•  If  whenever  a  A  a' , 

e  ho-'  (E  e  A)  ho-  respectively,  (I  ho-'  tc  ^  (L  /  A)  ho-  tc), 

then  E  (respectively,  L)  is  said  to  be  an  upper  approximation. 

•  If  whenever  a  A  a' , 

e  ho-'  V  (EeA)  ho-  v  respectively,  (I  ho-'  w  A)  ho-  w), 

then  E  (respectively,  L)  is  said  to  be  a  translation. 
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3.2.1  The  algorithm 


The  definition  of  E  is  indnctive  on  the  strnctnre  of  its  argnments,  and  L  is  defined  in  terms  of 

E. 

E  e  0  =  any  e'  G  Exp 


EeA 

EeA 

if(e',(EeA),(EeA')) 

e*  if  Ij  =  X  j  =  i 
X  otherwise 
Pp(Eei^,...,Ee„^)^ 


L  a:  A  =  X 

L(e.e')A  =  (EeA).(Ee'A) 

Notice  that  the  first  line  allows  the  choice  of  any  expression.  This  is  becanse  the  transfer 
relation  0  never  ontpnts  a  store,  and  so  any  expression  is  trivially  correct. 

Becanse  E  is  defined  by  strnctnral  indnction  and  L  is  defined  in  terms  of  E,  and  becanse 
the  only  external  algorithm  they  need  is  the  P  algorithm  to  symbolically  evalnate  primitive 
operations,  we  have  that  if  P  always  terminates  then  E  and  L  always  terminate. 

These  algorithms  are  not  only  nsed  in  the  composition  algorithm  ©  to  come,  bnt  they 
are  also  nsefnl  in  their  own  right,  as  stand-alone  applications  of  onr  analysis  methodology. 
Chapter  9  gives  an  application  that  is  centered  aronnd  the  E  algorithm. 

The  next  two  lemmas  prove  the  correctness  of  these  algorithms.  If  all  primitive  operations 
p  G  Primop  are  deterministic  then  the  algorithms  are  translations,  and  otherwise  we  can  only 
show  that  they  are  npper  approximations. 

Theorem  1  (E  and  L  as  upper  approximations)  If  P  is  a  symbolic  evaluation  then  E  and 
L  are  upper  approximations. 

Proof:  By  the  definition  of  npper  approximation  in  Definition  7,  we  mnst  prove  that  whenever 
a  Aa',  the  following  properties  hold: 

•  e  ho-'  (E  e  A)  ho- 

•  I  ho-'  w  ^  {LI  A)  Pc^  w 

We  prove  this  by  mntnal  strnctnral  indnction  on  the  argnments  to  E  and  L,  following  their 
indnctive  definitions  above.  There  are  six  cases  for  E. 
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•  E  e  0:  By  definition  there  is  no  a  and  a'  snch  that  a  0  a',  and  so  the  theorem  statement  is 
trivially  satisfied. 


Ee_ 

(7  A  a'. 


:'?  A 


:  By  the  definition  of  conditional  relations,  we  know  that  if  a 

e  \-fji  V 

(EeA)l-crr^  indnction,  with  above  observation 
(E  e|  e'?  |A)  hp-  V  definition  of  E 


s'?  A  cr'  then 


Ee 


•  Ee 


e' I  A  :  Analogons  to  the  previons  case. 


s'?  A  I  A' 


:  By  the  definition  of  conditional  relations,  we  know  that  if  cr  e'?  A  |  A' 


then  either  e'  true  and  cr  Act',  or  e'  false  and  cr  A'cr'. 


st' 


e  \-fji  V 

(e'  ho-  true  A  (E  e  A)  ho-  v) 

V  (e'  ho-  false  A  (EeA')  ho-  v) 
W(e',  (EeA),  (EeA'))  h^  v 
if(e',  (EeA),  (EeA'))  h^  v 

(Eef 


s'?  A  I  A'  )  ho 


induction,  with  above  observation 
definition  of  if 
P  is  a  symbolic  evaluation 

definition  of  E 


Ex 


■> 


!-)■  ei , . . . ,  e„ 


Note  that  if  li  =  L  =  x  then  i  =  j,  because  otherwise  li  and 


Ij  could  not  evaluate  to  different  1-values  and  it  could  not  be  the  case  that  a  5  cr'. 


X  ho-'  V 


a'  X  =  V 

(li  =  X  A  Si  \-(j  v)  y  (a  X  =  V  A  -<3i.  li 
(li  =  X  A  Si  ho-  r^)  V  (a:  ho-  A  ->3i.  li  = 

/r-  n - ; - lx  , 


(Ex\l 


■  5  In  '  t  ei ,  .  .  .  ,  Cfi  )  hor  V 


definition  of  h 

definition  of  assignment  relations 
definition  of  h 
definition  of  E 


•  E(p(ei,...,e„))^: 

(p(ei, . . .  ,e„))  ho'  V 

3-^1,  ...,Vn.  [(ALl  e*  '^i)  ^^("^1,  ...,Vn)  ^  a'  't'] 

=;>  3-^1,  ...,Vn.  [(ALl  (Ee*  S)  ho  Vi)  A'p(vx,  ...,Vn)  ^o'  't'] 

=;>  (Pp  (Eei  ^, . . . ,  Ee„  ^)  ^)  ho -y 
=;>  (E  (p(ei, . . .  ,e„))  5)  ho  v 


definition  of  h 
induction 

P  is  a  symbolic  evaluation 
definition  of  E 


There  are  two  cases  for  L. 


•  L  a:  A: 


X  ho'  w 

w  =  X  definition  of  h 

a:  ho  tc  definition  of  h 

(L  a:  A)  ho  tc  definition  of  L 
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•  L  (e.e')  A: 


(e.e^)  \-(ji  w 

=;> 

3r;, v' .  [e  ho-'  v  Ae'  h^-'  v'  Aw  =  v.v'] 

definition  of  h 

=;> 

3r;,  v' .  [(E  e  A)  ho-  r;  A  (E  e'  \)\-rT  v'  Aw  =  v.v'] 

induction 

=;> 

((EeA).(Ee'A))  P^w) 

definition  of  h 

=;> 

(L  (e.e')  A)  w 

definition  of  L 

□ 


Theorem  2  (E  and  L  as  translations)  If  all  primitive  operations  p  G  Primop  are  determin¬ 
istic  and  P  is  a  symbolic  evaluation  then  both  E  and  L  are  translations. 

Proof:  We  prove  the  statement  for  E.  Because  P  is  a  symbolic  evaluation,  we  have  from 
Theorem  1  that  E  is  an  upper  approximation.  Because  all  primitive  operations  are  deterministic, 
we  know  from  Lemma  2  that  for  any  e.  A,  cr,  and  a': 


•  There  is  exactly  one  v  such  that  (EeA)  Prr  v. 


•  There  is  exactly  one  v  such  that  e  \-(j'  v. 


Therefore,  the  implication  relationship  in  the  definition  of  upper  approximation  of  E  is  equiv¬ 
alent  to  an  iff  relationship,  and  therefore  E  is  a  translation.  The  proof  for  L  is  analogous. 

□ 


3.2.2  Examples 

The  E  algorithm  not  only  is  required  for  the  composition  algorithm  ©  that  we  will  present  later 
in  this  chapter,  but  is  also  useful  on  its  own  for  the  analysis  of  how  values  relate  to  each  other  at 
different  times  of  program  execution.  For  instance,  dependency  analysis  [ASU86]  is  concerned 
with  such  properties.  In  Chapter  9  we  will  see  some  example  applications  of  E. 

Here,  we  will  give  some  examples  of  how  E  works.  The  L  algorithm  is  just  an  application 
of  E,  so  we  will  not  demonstrate  it  separately.  The  simplest  examples  are  those  in  which  the 
expression  given  to  E  as  input  does  not  contain  any  context-dependent  primitives.  Here  are 
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some  examples,  where  x  7^ 
E  a: 

E  y 

E  X  +  y 
E  y  +  y 
E  X  +  y 

E  a: 


are  variables. 


X 

3 

a:  +  3 

6 

7 

if  (a:  =  3, 4,  a:) 


A  more  complicated  case,  however,  is  when  E  is  given  an  expression  that  nses  the  context- 
dependent  operation  deref  to  examine  the  store.  Recall  that  we  overload  the  phrase  e.e' 
to  mean  not  only  the  1-expression  e.e'  bnt  also  the  expression  deref  (e,e').  In  the  following 
examples,  v  ^  v'  are  members  of  Val  that  are  inclnded  as  constant  (nnllary)  primitive  operations. 
Examples  of  snch  valnes  that  might  occnr  in  a  real  programming  langnage  are  record  field 
names,  the  C  *  token,  and  integers  representing  array  indices.  Also,  x  ^  y  ^  z  are  variables. 
In  the  following  examples,  some  of  the  eqnality  terms  in  the  symbolic  evalnation  of  deref  are 
simplified  to  true  or  false  dne  to  the  symbolic  evalnation  of  eqnality  on  valnes  {v  and  v'  in 
this  case),  and  thereby  simplify  the  resnlting  symbolic  evalnation  of  if. 


E 

E 

E 

E 

E 

E 

E 

E 

E 


a:  1— )■  y 

x.v  1— )■  3 

y.v  !-)■  3 

y.v'  3 

1 

y.v,  z.v  1— )■  3, 4 

x.v.v  1— )■  3 

y.v'  .V  1— )■  3 

1 

x.v,  y.v  1— )■  3, 4 

y.v,  x.v  1— )■  4,  3 

y.v 

3 

if  (a:  =  y,  3,  x.v) 
x.v 

if  (a:  =  y,  3,  if  (a:  =  z,4,x.v)) 
if  (a:  =  x.v,  3,  x.v) 
if  (a:  =  y.v' ,  3,  x.v) 

3 

if  (a:  =  y,4,3) 


The  last  two  examples  may  seem  strange.  Recall  that  the  symbolic  evalnation  of  deref,  given 
some  assignment  relation,  chooses  an  arbitrary  order  of  its  assignments  and  then  bnilds  a  linear 
seqnence  of  nested  if  expressions.  In  the  above  examples,  we  choose  the  left-to-right  order 
to  demonstrate  that  the  order  does  indeed  play  a  practical  role  in  the  qnality  of  the  ontpnt. 
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The  penultimate  example  first  checks  x  for  equality  against  x,  which  simplifies  to  true  and 
thus  simplifies  the  entire  result  to  3.  The  justification  of  this  general  procedure  is  given  in  the 
rather  intricate  proof  of  the  symbolic  evaluation  of  deref .  Intuitively,  in  this  case  the  output 
expression  does  not  have  to  check  for  aliasing  because  the  semantics  of  assignment  relations 
guarantees  that  if  a  x.v,  y.v  i— )■  3, 4  a'  then  x  and  y  are  bound  to  different  values  in  a.  Because 
order  of  assignment  is  irrelevant,  3  would  have  thus  been  a  correct  output  for  the  last  example, 
as  well.  However,  E  instead  outputs  if  (a:  =  y,  4,  3).  The  reason  that  it  tests  the  equality  x  with 
y  first,  which  cannot  be  simplified.  This  suggests  that  the  symbolic  evaluation  of  deref  should 
instead  choose  an  order  that  places  first  any  alias  test  that  simplifies  to  true.  In  this  case,  the 
last  example  would  indeed  output 


x.v 


y.v,  x.v  4,  3  =  3 


If  the  second  argument  of  a  deref  expression  (i.e.,  the  expression  to  the  right  of  the  dot)  is 
not  a  value,  then  the  symbolic  evaluations  cannot  perform  as  many  simplifications.  An  example 
of  such  a  case  that  might  occur  in  a  programming  language  is  an  array  access  where  the  index 
is  a  non-constant  expression.  Here  are  some  more  complicated  examples,  where  e  7^  e'  are  non¬ 
value  expressions.  We  first  note  that  nondeterministic  primitive  operations  may  produce  a  more 
complex  output  expression.  For  instance,  if  determ(e)  (i.e.,  if  e  contains  no  nondeterministic 
primitive  operation)  then 


E 


x.e  I— )■  3 


3 


as  expected,  but  if  -ideterm(e)  then 


E 


x.e  I— )■  3 


if  (bool,  3,  x.e) 


where  bool  is  a  nondeterministic  operation  that  evaluates  to  both  true  and  false.  The  reason 
is  justified  in  the  proof  of  the  symbolic  evaluation  of  =.  Intuitively,  because  e  contains  a 
nondeterministic  primitive  operation,  it  may  evaluate  to  two  values  v  ^  v' ,  and  in  that  case  the 
expression  e  =  e  evaluates  to  both  true  and  false. 

In  the  remaining  examples,  we  assume  determ(e)  and  determ(e'). 


E 

E 

E 


x.e'  I— )■  3 


y.e  3 


y.e'  3 


if  (e  =  e',  3,  x.e) 
if(x  =  y,  3,  x.e) 
if{{x  =  y)  k  {e  =  e'),  3,  x.e) 


3.3  Symbolic  Evaluation  of  Conditional  Relations 

The  remainder  of  the  composition  algorithm  is  parameterized  by  an  algorithm  that  constructs 
a  conditional  transfer  relation  from  a  conditional  expression  and  a  transfer  relation  for  each  of 
the  two  branches. 

C  G  Exp  TR  ^  TR  TR 

As  for  E  and  L,  we  distinguish  two  different  correctness  conditions. 
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Definition  8  (Symbolic  evaluation  of  conditional  relations)  We  introduce  the  following 
terms  to  describe  correctness  properties  of  C. 

•  If 

a  e?  A  I  A'  a  ^  a  (Ce  A  A^)  a' 
then  C  is  said  to  be  an  upper  approximation. 

•  If 

a  el  A  I  A'  cf'  cr(CeAA^)cr^ 

then  C  is  said  to  be  a  translation. 

The  following  lemma  makes  it  easier  to  prove  the  stronger  property  about  C  in  the  case  that 
all  primitive  operations  are  deterministic. 

Lemma  6  If  all  primitive  operations  p  G  Primop  are  deterministic,  C  is  an  upper  approxima¬ 
tion,  and 

cr  (C  e  A  A^)  (j'  3a" .  a  el  A  |  A'  a" 

then  C  is  a  translation. 

Proof:  Prom  Corollary  1.  □ 

The  most  obvious  choice  for  C  is  simply 

CeAA^  =|e?  aJa^. 

But  it  is  sometimes  possible  to  simplify  the  resulting  transfer  relation.  For  example, 

Ctrue  A  A'  =  A. 


3.4  Engineering  Flexibility 

The  P  and  C  algorithms  provide  an  engineering  flexibility  for  the  composition  algorithm.  There 
are  many  correct  choices  for  the  syntactic  composition  A  ©  A',  but  most  of  the  differences 
involve  how  far  primitive  applications  are  simplified  and  how  far  conditionals  are  simplified. 
Different  algorithms  P  and  C  will  allow  a  tradeoff  between  the  cost  of  computing  a  composition 
and  its  quality. 
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3.5  Symbolic  Evaluation  of  Assignment  Merging 


The  most  difficult  part  of  the  composition  algorithm,  which  we  will  present  in  full  in  Section  3.6, 
is  the  composition  of  two  assignment  relations  <5  and  <5'.  Consider  the  composition 


X  !-)■  3 


y  X 


The  first  issue  is  that  the  1-expression  y  and  the  expression  x  that  occur  in  I  y  i-)-  x  I  are  evaluated 


in  a  store  after  the  assignment  |  x  i-)-  3  |  takes  place.  But  to  build  a  transfer  relation  in  TR 
that  is  equivalent  to  this  assignment,  we  must  first  build  a  corresponding  1-expression  and  a 
corresponding  expression  that  are  to  be  evaluated  in  a  store  before  the  assignment  x  i— )■  3  .  The 
L  and  E  algorithms  accomplish  this  task.  First  of  all,  we  compute 


Ly  X  !-;•  3  =  y. 


This  expresses  the  fact  that  the  1-expression  y  evaluates  to  the  same  l-value  (which  happens  to 
be  y)  both  before  and  after  x  i-)-  3  .  Then,  we  compute 


E  X  X  I— )■  3  =  3. 


This  expresses  the  fact  that  the  expression  3  evaluates  before  the  assignment  |  x  i-)-  3  |  to  the 
same  value  to  which  x  evaluates  after  the  assignment. 

So,  we  replace  y  by  y  and  x  by  3,  yielding  the  assignment  relation 


y  3 


Now  we  must  merge  the  first  assignment,  x  i— )■  3  ,  with  this  new  assignment,  yielding 


x,y  H-  3,3 


for  the  composition. 

This  “merging”  is  not  the  same  as  composition.  Simply  merging 


X  I— )■  3 


with 

y  X 

yields 

x,y  3,x 

which  is  not  semantically  equivalent  to  their  composition 


x,y  H-  3,3 
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The  merging  operation  is  simpler  than  composition,  because  it  does  not  use  E  and  L  generate 
the  “adjusted”  expressions  and  1-expressions.  This  section  presents  an  algorithm  to  perform 
assignment  merging,  and  the  composition  operation  ©  will  use  this  merging  operation  as  a 
subroutine,  in  the  manner  we  described  above. 

The  reason  that  we  need  an  algorithm  to  perform  assignment  merging  is  that  it  is  not  always 
as  trivial  as  merely  concatenating  two  lists  of  1-expressions  and  expressions,  as  we  did  for  the 
example  above.  For  example,  consider  merging 

x,y  H-  2,3 

with 

X,  z  I— w,  y  . 

The  literal  concatenation  of  these  two  is 


x,y,  x,z  2,3,w,y 

which  is  semantically  equivalent  to  the  empty  relation  0  because  there  are  two  1-expressions  x 
that  always  evaluate  to  the  same  1-value  x.  The  merging  that  we  are  looking  for  is 


x,y,z  w,3,y 

that  replaces  the  assignment  to  x  in  the  first  relation  with  the  assignment  to  x  in  the  second 
relation. 


Note  once  again  that  this  is  not  semantically  equivalent  to  the  composition  of  the  two 
relations  because  the  composition  will  perform  the  assignment  to  y  before  evaluating  y  in  the 
second  assignment.  In  the  merging  of  two  assignment  relations  S  and  S' ,  the  1-expressions  and 
expressions  in  both  5  and  S'  are  considered  to  be  evaluated  in  the  same  initial  store. 


The  merging  of  S  with  S'  is  written  S  S'.  (This  operator  ®  should  not  be  confused  with 
the  composition  operator  ©.)  It  is  not  symmetric,  because,  as  in  the  example  above,  if  S  and  S' 
both  assign  to  the  same  1- value,  the  conflict  is  resolved  in  favor  of  S'.  Recall  that  the  semantics 
of  an  assignment  relation 


Wi  ■  ■  ■  iln  '  t  ,  .  .  .  ,  Cjj 


requires  that  the  n  1- values  to  which  li, . . .  ,ln  evaluate  must  be  distinct  in  order  for  the  assign¬ 
ment  to  take  place.  Given  the  above  assignment  relation  and  a  second  assignment  relation 


S'  = 


I'l, 


1 5  •  •  •  5 


the  relation  S  S'  defined  by  the  following  rule: 


k 

I'  h. 


Wj 


Wi 


('i  ^  (j 


Vi 


Wi  ^  Wj 

w'i  +  w'j 


o'  =  a[wi  !-;•  r;i] . . .  [wn  H-  Vn][w'i  i-t  . . .  ( 


icL  I— t  v'.^ 


a  [S  ®  S') 


a 
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Again,  note  that  ^  ®  is  not  necessarily  semantically  eqnivalent  to  the  concatenation 


becanse  the  above  rnle  allows  Wi  to  be  eqnal  to  w'j  for  some  i  and  j.  Intnitively,  ^  ®  cannot 
merely  nnion  the  assignments  in  S  and  S'  becanse  an  assignment  in  the  latter  may  overwrite  an 
assignment  in  the  former. 

In  this  section,  we  present  an  algorithm  ®  to  compnte  ®,  which  is  one  of  the  most  difficult 
parts  of  the  composition  algorithm  0.  The  algorithm  ®  is  indnctive,  and  for  ease  of  notation  in 
its  correctness  proof  we  generalize  ®  to  take  an  additional  parameter  J  to  reflect  this  indnction: 
a  set  of  indices  of  the  assignments  in  the  right-hand  relation.  The  following  rnle  deflnes  this 
generalized  ®j. 

h(j  Wi  l“(j  ^  7“  J  ‘^i  7“  ‘^j 
h  Cj  \~(J  *  7^  J  ^  7^  '^j 

j  ^  J  ^w'j  ^  {tCl,  .  .  .,Wn} 

o'  =  o[wi  !-;•  r;i]  .  .  .  [Wn  H-  Vn][w'i  H-  v'^] . . .  [w'^  I-7-  v'^] 

o  0  J  S')  a' 

In  the  relation  S  0  j  S' ^  J  is  a  set  of  indices  into  the  list  of  assignments  in  S' .  If  j  G  J,  then  the 
jth  1-expression  in  S'  mnst  not  overwrite  any  assignment  in  S.  The  relation  S  00  S'  is  simply 
S  0  S' .  If  the  length  of  S'  is  m,  then  the  relation  S  S'  is  semantically  eqnivalent  to  the 

literal  concatenation  of  S  with  S'  as  we  described  above. 

Before  we  present  0,  we  need  an  anxilliary  algorithm 

~  G  Lexp  X  Lexp  — >•  Exp 

that,  given  two  1-expressions  I  and  generates  an  expression  that  tests  if  the  I  and  I'  can 

evalnate  to  the  same  1-valne  or  to  different  1-valnes.  It  is  defined  to  be  false  except  for  the 

following  cases: 

X  ^  X  =  true 

61.62  ~  61.62  =  (61  =  61)  &  (62  =  62) 

Formally,  we  have  the  following  properties  of  ~. 

Lemma  7  If  P  is  a  symbolic  evaluation,  then: 

•  If  3w.  [I  w  A  I'  ho-  w]  then  {I  ~  I')  ho-  true. 

•  If  3w,  w'.  [w  w'  Al  Pa  w  Al'  Pa  w;]  then  (I  ~  I')  Pa  false. 

Proof:  Straightforward.  □ 

Now  we  present  the  algorithm 

0  E  ATR  X  ATR  x  Pfin(Nat)  — )■  TR 
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to  compute  ®  as  follows: 


<®jS' 


ll:  ■  ■  ■  i^n  '  t  ei,  .  .  .  ,  Cjj  ®  J  S 


5’ 


where 


S' 

{jl )  •  •  •  )  JA;  } 
Ao 
Ai 


h,  ■  ■ 

■  ,ln  1— t  62,  .  . 

•  5 

l'„.. 

•  5  C  e'l,-  ■ 

6^ 

•  5  '^m 

. ,  m}  —  J 

ordered  arbitrarily 


^  ®  JU{m+l}  )  •  •  •  )  5  ^  )  •  •  •  ) 


The  order  of  ji, . . . ,  is  arbitrary;  the  correctness  proof  will  make  no  assumption  as  to  their 
order.  There  may  be  engineering  advantages  to  choosing  a  particular  order  dynamically,  because 
a  particular  choice  of  C  might  produce  different  results  with  different  orderings.  This  is  similar 
to  the  situation  with  the  symbolic  evaluation  of  deref  that  we  illustrated  with  the  examples  in 
Section  3.2. 


Intuitively,  S  ®  S'  examines  each  assignment  of  S  in  turn,  to  see  which  ones  might  be  over¬ 
written  by  S'  and  thus  should  be  eliminated,  and  which  ones  might  not  be  overwritten  by  S'  and 
thus  should  remain.  The  assignments  in  S  are  so  processed  from  left  to  right.  The  1-expression 
of  each  one  is  tested  in  turn,  via  ~,  against  the  1-expressions  of  S'  not  already  in  the  set  J. 
Whenever  an  1-expression  in  S  might  be  equal  to  some  1-expression  I'j  in  S' ,  that  1-expression 
never  needs  to  be  tested  for  equivalence  again,  and  so  j  is  added  to  J.  It  is  this  handling  of  J 
that  is  rather  subtle,  but  the  correctness  proof  explains  this  in  detail. 

Because  ®  is  defined  by  structural  induction,  and  because  the  only  external  algorithms 
it  needs  are  the  P  algorithm  to  symbolically  evaluate  the  primitive  operations  in  ~  and  the 
C  algorithm  to  symbolically  evaluate  conditional  relations,  we  have  that  if  P  and  C  always 
terminate  then  ®  always  terminates. 

Now  we  may  proceed  with  the  proof  of  ® .  First  we  show  that  ®  computes  a  relation  that 
includes  ®. 


Lemma  8  If  P  is  a  symbolic  evaluation,  C  is  an  upper  approximation,  and  a  ®  j  S')  a' ,  then 
(J  {S®j  S')  a' . 


Proof:  By  induction  on  the  size  of  If  ^  =  •  then  the  result  is  immediate.  Otherwise, 
without  loss  of  generality,  let 


h,.. 

•  5  ^77,  '  ^  5  •  • 

•  5 

II 

•  5  C  1-^  e'l , . . 

•  5  '^m 
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and  let  {ji, . . . ,  j^}  =  {1,  •  •  •  ,  m}  —  J,  where  the  order  of  ji, . . . ,  is  arbitrary.  Let  tai, . . . , 
vi,. . . ,  Vn,  w'l, . . . ,  w'^,  and  v'l, . . . ,  v'^  be  as  given  by  the  definition  of  ®  j.  By  that  definition, 
we  know  that 


•  wi  ^  {w2, . .  .,Wn},  and 

•  wi  ^  {w'j  I  j  G  J}. 
There  are  two  cases. 


•  Case  1:  There  is  some  j  ^  J  snch  that  wi  =  w'y  Becanse  P  is  a  symbolic  evalnation  we 
have  by  Lemma  7  that 

(/i  ~  Ij)  ho-  true. 

In  this  case,  the  npdate  to  wi  in  store  a  is  overwritten  by  the  later  npdate  to  re'-,  and 
so  in  this  case  it  may  be  removed  from  the  definition  of  cr'  in  the  rnle  that  defines  ®j. 
Furthermore,  becanse  wi  =  w'j  we  have  that  w'j  0  {rc2, . . .  ,Wn}^  and  so  j  may  be  added 
to  J  in  the  rnle  that  defines  ®  j.  Therefore, 


h,  ■  ■ 

.  ,ln  1— )■  62,  .  . 

•  5 

By  indnetion,  we  have  that 

O'  ( 

h,  ■ 

.  .  ,ln  62,  . 

•  •  5 

•  Case  2:  There  is  no  j  0  J  snch  that  wi  =  w'j,  and  so 

A  (^1  ~  ^j)  '“cr  false. 

In  this  case,  wi  0  {w2, . . . ,  Wn,w'i, . . . ,  w'^}.  Hence,  the  npdate  of  wi  to  in  store  a  is 
not  overwritten  and  thns  may  be  moved  to  the  end  of  the  list  of  npdates  in  the  definition 
of  ®j.  Therefore, 


to 

to 

•  5 

®JU{m+l} 

•  5  Cl  . . 

■  1  ^mi 

By  indnetion,  we  have  that 

o-( 

to 

to 

■  •  5 

]®JU{m+l} 

•  1  Cl  e'l, . . 

•  1  ^mi  ) 

Therefore,  becanse  C  is  an  npper  approximation,  either  all  of  the  branches  in  S  ®  j  S'  will  evalnate 
to  false  in  a,  in  which  case  a  S')  a'  by  Case  2,  or  at  least  one  the  branches  will  evalnate 
to  true  in  a,  in  which  case  a  ®  j  S')  a'  by  Case  1.  □ 

Now  we  show  that  if  all  primitive  operations  are  deterministic,  ®  compntes  a  relation  that 
is  precisely  ®. 
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Lemma  9  If  all  primitive  operations  p  G  Primop  are  deterministic,  P  is  a  symbolic  evaluation, 
C  is  a  translation,  and  a  ®  j  5')  a' ,  then  a  ®  j  5')  a' . 

Proof:  By  induction  on  the  size  of  If  ^  =  •  then  the  result  is  immediate.  Otherwise, 
without  loss  of  generality,  let 


h,.. 

•  5  ^71  '  ^  5  •  • 

•  5 

II 

l[,.. 

•  5  C  i-t  e'l , . . 

•  5  '^m 

and  let  {ji, . . .  ,jk}  =  m}  — J,  where  the  order  of  ji, . . . ,  jk  is  arbitrary.  Because  all  prim¬ 

itive  operations  are  deterministic,  we  know  from  Lemma  2  that  each  1-expression  (expression) 
evaluates  to  a  unique  1- value  (value) .  Let  tci , . . . ,  ,  vi, . . .  ,Vn,  , . . . ,  w'^ ,  and  , . . . , v'^ , 

correspond  to  5  and  5'  as  shown  above.  There  are  two  cases. 

•  Case  1:  There  is  some  j  ^  J  such  that 

(/i  ~  Ij)  her  true 

and 

By  induction. 

Hence,  by  the  definition  of  ®  ju{j}  we  have  that 

—  W2,  ■  ■  ■ ,  Wji  are  distinct, 

—  . . . ,  w'^  are  distinct, 

-  k  E  J  ^  w'j^  ^  {w2, . . . ,  Wn},  and 

-  w'j  0  {W2,  .  .  .,Wn}. 

But  because  P  is  a  symbolic  evaluation,  we  have  by  Lemma  7  that  wi  =  wb.  Hence, 

-  Wi^  {W2,  .  .  .,Wn}, 

—  k  ^  J  ^  w'^  ^  wi,  and 

—  an  assignment  to  w'^  overwrites  an  earlier  assignment  to  wi. 

Therefore,  by  the  definition  of  ®j. 


cr  (^  0 j  S')  a'. 

Case  2: 

o-( 

to 

to 

^1,  .  .  .  ,  ^  *  *  *  5  ) 

By  induction. 

to 

to 

^1,  .  .  .  ,  ^  *  *  *  5 

Hence,  by  the  definition  of  0 we  have  that 


Jn  ^  ^2,  ' 


a 


Jn  ^  ^2,  ' 


a 
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—  W2,  ■  ■  ■ ,  Wji  are  distinct, 

—  ,  w!^,wi  are  distinct, 

-  k  E  J  ^  w'j^  ^  {w2, . . . ,  Wn},  and 

-  Wi^  {W2,  .  .  .,Wn}. 

Therefore,  becanse  the  assignment  to  wi  does  not  overwrite  any  preceding  assignment,  it 
may  be  moved  to  the  front,  and  hence  by  the  definition  of  ®  j, 

(T  ®j  S')  a'. 


□ 


3.6  The  Composition  Operation 

Finally,  we  are  ready  to  present  the  syntactic  composition  operation 

©  G  TR  X  TR  TR 

Definition  9  (Syntactic  composition  of  transfer  relations)  We  introduce  the  following 
terms  to  describe  correctness  properties  of®. 

•  If 

a  (A;  A')  a'  ^  a  (A  ©  A')  a' , 
then  ©  is  said  to  be  an  npper  approximation. 

•  If 

a  {A]  A')  a'  cr  (A  ©  A')  cr', 
then  ©  is  said  to  be  a  translation. 

The  definition  of  the  ©  algorithm  is  as  follows. 


0  © 

A 

=  0 

\  © 

0 

=  0 

^  © 

A" 

=  C  e  (A  ©  A")  (A'  ©  A") 

5  © 

< 

< 

=  C(E, 

e^)  ©  A)(^  ©  A') 

5  © 

'  '  '  1  ^  5  •  •  •  5 

=  ^©0 

L  ^, . . . ,  L  ^  1— )•  E  ei  ^, . . . ,  E  ^ 

We  have  shown  that  if  the  P  and  C  algorithms  terminate  then  E,  L,  and  ©  algorithms  terminate. 
So  becanse  ©  is  defined  by  strnctnral  indnction,  it  thns  always  terminates. 
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We  will  give  some  examples  of  the  composition  algorithm  later  when  we  nse  transfer  relations 
to  model  the  semantics  of  programming  langnages.  For  the  remainder  of  this  chapter  we  give 
the  correctness  proofs  of  ©. 

Theorem  3  (©  as  an  upper  approximation)  If  P  is  a  symbolic  evaluation  and  C  is  an 

upper  approximation  then  ©  is  an  upper  approximation. 


Proof:  Becanse  P  is  a  symbolic  evalnation,  we  have  from  Theorem  1  that  E  and  L  are  npper 
approximations.  We  proceed  by  strnctnral  indnction.  There  are  five  cases. 


(0;  A)  =  0  =  (0  ©  A) 
(A;0)  =  0  =  (A  ©  0) 

A"): 


;?  A  I  A' 


cr 


< 

<1 

;A": 

el  A 

|A' 

cr 


A  A  // 

(7  A  (7  Z\  (7 


3cr^[((e  ho-  true  Aa  Aa') 

V  (e  ho-  false  A  a  A'  a'))  A  a'  A"  a''] 
(e  ho-  true  A  o  (A;  A")  o") 

V  (e  ho-  false  A  a  (A';  A")  a") 

(e  ho-  true  A  cr  (A  ©  A")  o") 

V  (e  ho-  false  A  a  (A'  ©  A")  a”) 


e?  A  1  A' 

)a" 

[cr  ^  cr'  A  a' 

< 

< 

<>• 

a 


3a'.  [a  5  a'  A  ((e  ho-'  true  A  a'  A  a") 

V  (e  ho-'  false  A  a'  A'  a"))] 
3a' .  [a  5  a'  A  {{e'  ho-  true  A  a'  A  a") 

V  {e'  ho-  false  A  a'  A'  a"))] 
{{e'  ho-  true  A  a  (^;  A)  a") 

V  {e'  ho-  false  A  a  (^;  A')  a")) 

((e'  ho-  true  A  cr  ©  A)  a") 

V  {e'  ho-  false  A  cr  (^  ©  A')  a")) 


relation  composition 
definition  of  conditional  relations 
relation  composition 
indnction 

definition  of  conditional  relations 
assnmption  abont  C 
definition  of  © 


relation  composition 
definition  of  conditional  relations 
E  is  an  npper  approximation 
relation  composition 
indnction 

definition  of  conditional  relations 
assnmption  abont  C 
definition  of  © 
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a(S-, 


,  Iji  1  ^  . 

■  •  5 

):  For 

h,  ■  ■  ■  ,ln  ei, 

. . . ,  e^i 

[cr  ^  cr'  A  a' 

h,.. 

•  5  ^  5  •  •  •  5 

[aS  a'  A  3w;i, . . . 

,Wn,Vi 

,...,Vn.[ 

{i^j^  Wi 

^Wj) 

a 


r  j  wj) 

^  (Ai=l  ^(t'  '^i  ^  ^(t' 

//  /  r  nr 


cr  ® 


cr 


cr  (^  © 


5  a’ 

A  3w;i 

...,w 

raj  •  • 

(* 

A  J  - 

>  Wi  ^ 

Wj) 

A(Ar 

V  P 
=  1  h 

r  Wi  Ac 

A  cr" 

=  a'[wi  1— )■ 

■■■Jn 

h-7  e'l, 

•  •  5 

I'l, 

If 

•  •  •  1  f'n 

h-7  e;, 

•  •  5 

h, 

.  .  .  ^In 

HP  ei, 

•  •  5 

[w;„  Vn] 


a 


a  ‘ 


a  ‘ 


=  (L  li  and  e[  =  (E  e*  ^). 


relation  composition 


definition  of  assignment  relations 


E  and  L  are  npper  approximations 
definition  of  ® 

Lemma  8 
definition  of  © 


□ 


Theorem  4  (©  as  a  translation)  If  all  primitive  operations  p  G  Primop  are  deterministic,  P 
is  a  symbolic  evaluation,  and  C  is  a  translation,  then  (B  is  a  translation. 


Proof:  We  know  from  Theorem  3  that  ©  is  an  npper  approximation.  Therefore,  from  Corol¬ 
lary  1,  we  need  only  show  that 


(7  (A  ©  A')  a'  3a".  a  (A;  A')  a" 


to  establish  that  ©  is  a  translation.  Prom  Theorem  2  we  have  that  E  and  L  are  translations. 
We  proceed  by  strnctnral  indnction.  There  are  five  cases. 


•  (0  ©  A)  =  0,  and  so  cr  (0  ©  A)  a'  mnst  be  false. 


(A  ©  0)  =  0,  and  so  cr  (A  ©  0)  a'  mnst  be  false. 
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e: 


A  I  A' 


A"): 


a 


=?  A  I  A' 


A")  a’ 


a{Ce{A®  A")  (A'  ©  A"))  a' 
a 


?  (A  ©  A")  I  (A'  ©  A") 


=? 


a 


(e  ho-  true  A  cr  (A  ©  A")  cr') 

V  (e  ho-  false  A  cr  (A'  ©  A")  o') 

(e  ho-  true  A  3cr".  cr  (A;  A")  a") 

V  (e  ho-  false  A  3cr".  cr  (A';  A")  a" 

(e  ho-  true  A  3cr'.  a  A  o'  A  3a"  o'  A"  a") 

V  (e  ho-  false  A  3cr'.  a  A'  o'  A  3a"  a'  A"  a") 
3a' ,  a",  [((e  ho-  true  A  cr  A  cr') 

V  (e  ho-  false  A  a  A'  cr'))  A  cr'  A"  cr" 


3a".  a 


e?  A  1  A' 

e?  A  1  A' 

;A 

a'  A  a'  A"  a"] 
")  a" 


definition  of  © 

C  is  a  translation 

defn.  of  conditional  relations 

indnction 

relation  composition 
distribntivity 

defn.  of  conditional  relations 
relation  composition 


(^  ©  e?  A  I  A' 


Let  e' 


{Ee5). 


[e'  ho-  true  A  cr  ©  A)  cr') 

V  (e'  ho-  false  A  cr  (^  ©  A')  cr') 

(e'  ho-  true  A  3cr".  cr  (^;  A)  cr") 

V  (e'  ho-  false  A  3cr".  a  (^;  A')  cr") 

3cr',  cr".  [cr  ^  cr'  A  ((e'  ho-  true  A  a'  A  cr") 

V  (e'  ho-  false  A  cr'  A'  cr"))] 
3cr'.cr".  [cr  ^  cr'  A  ((e  ho-'  true  A  a'  A  a") 

V  (e  ho-'  false  A  cr'  A'  cr"))] 

I — 


5  a'  A  a'  i 

e?  A  1  A' 

<1 

<1 

')a" 

a 


definition  of  © 

C  is  a  translation 

defn.  of  conditional  relations 

indnction 

relation  composition 

E  is  a  translation 

defn.  of  conditional  relations 

relation  composition 
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(^  ©  /i,  ei, ):  For  *  G  {1, . . .  ,  n}  let  l[ 


cr  (^  © 

A,. 

.  .  ^In  1  ^  5  •  • 

•  5  Cn  ) 

=;> 

O  ©0 

A,. 

. .  ,1'^  e'l, . . 

•  5  ) 

=;> 

o{5  ©0 

■  ■  ■  J'n  ^1:  ■ 

•  •  5 

=;> 

3cr',  tCi, 

.  .  .  , 

Wn- 

a 


a 


[o'  5  (7  A  (i  7^  J  Wi  7^  A  Ai=l  ^(T  '^i] 
3cr',  w;i, 

[(T  ^  (T  A  (i  7^  J  Wi  7^  '^j)  A  Ai=l  h  I~(t'  'l^i] 
3cr',  a" .  [a  8  o'  /\  o' 


Wi  •  •  •  iln  '  t  ei , . . . ,  e^j  o  ] 


3cr".  cr  (^;  A,  •  •  •  ,  i-t  ei,  .  .  .  ,  )  cr'' 


=  (L  li  and  =  (E  e*  ^). 


definition  of  © 

Lemma  9 

definition  of  © 

L  is  a  translation 

definition  of  assignment  relations 

relation  composition 
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via  Transfer  Relations 


In  Chapter  2,  we  introduced  the  store  as  an  object  for  modeling  the  state  of  memory  during 
a  point  of  program  execution.  We  also  introduced  the  notion  of  relating  a  store  at  one  point 
in  an  execution  to  some  later  point  in  the  execution;  these  relations  are  called  transfer  rela¬ 
tions.  For  the  sake  of  automatic  program  analysis,  we  developed  in  Chapter  3  a  computer 
representation  for  the  kinds  of  transfer  relations  that  might  naturally  correspond  to  such  exe¬ 
cution  segments,  and  we  gave  an  algorithm  for  composing  these  representations,  to  build  bigger 
execution  segments  out  of  smaller  ones. 

However,  those  chapters  presented  these  concepts  in  an  abstract  manner,  apart  from  any 
particular  programming  language.  Although  we  gave  examples  designed  to  spark  intuition  about 
actual  programming  languages,  we  never  described  how  these  transfer  relations  correspond  to 
any  kind  of  a  semantics  of  a  programming  language.  In  this  chapter,  we  describe  a  semantic 
methodology  of  programming  languages  that  is  founded  upon  transfer  relations,  and  as  such  is 
particularly  useful  as  a  basis  for  program  analysis. 


4.1  Denotational  and  Operational  Semantics 

The  semantics  of  programming  languages  is  a  topic  both  broad  and  deep,  and  we  can  only 
touch  on  some  of  the  overarching  issues  here,  in  order  to  put  our  work  in  a  larger  perspective. 
A  denotational  semantics  [Sto77]  uses  structural  induction  to  assign  each  term  in  the  source 
language  an  object  in  some  abstract  model.  The  spirit  of  denotational  semantics  is  to  model 
function  terms  in  the  source  language  with  actual  functions.  This  turns  out  to  be  difficult; 
Dana  Scott  solved  the  underlying  problems  [Sco70,  Sco76,  Sco82].  On  the  other  hand,  Jean- 
Yves  Girard  in  [GLT89]  makes  the  following  philosophical  observation  about  the  a,  (3,  and  rj 
equations  of  A-calculus  [Bar84]: 

In  fact,  these  equations  may  be  read  in  two  different  ways,  which  re-iterate  the 
dichotomy  [in  logic]  between  sense  and  denotation: 
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•  as  the  equations  which  define  the  equality  of  terms,  in  other  words  the  equality 
of  denotations  (the  static  viewpoint). 

•  as  rewrite  rules  which  allows  us  to  calculate  terms  by  reduction  to  a  normal 
form.  That  is  an  operational,  dynamic  viewpoint,  the  only  truly  fruitful  view 
for  this  aspect  of  logic. 

Of  course  the  second  viewpoint  is  under-developed  by  comparison  with  the  first 
one,  as  was  the  case  in  Logic!  For  example  denotational  semantics  of  programs 
(Scott’s  semantics,  for  example)  abound:  for  this  kind  of  semantics,  nothing  changes 
throughout  the  execution  of  a  program.  On  the  other  hand,  there  is  hardly  any 
civilised  operational  semantics  of  programs  (we  exclude  the  ad  hoc  semantics  which 
crudely  paraphrase  the  steps  toward  normalisation).  The  establishment  of  a  truly 
operational  semantics  of  algorithms  is  perhaps  the  most  important  problem  in  com¬ 
puter  science. 

The  dichotomy  between  sense  and  denotation  in  logic  to  which  Girard  refers  is  the  comparison 
between  Tarski’s  classical  view,  in  which  for  instance  the  meaning  of  ^  A  R  is  its  truth  value 
and  is  given  by  a  truth  table  on  the  meanings  of  A  and  B,  and  Heyting’s  intuitionistic  view, 
in  which  the  meaning  of  ^  A  R  is  a  proof  and  is  given  by  a  proof  of  A  coupled  with  a  proof 
of  B.  This  leads  to  the  study  of  proof  theory,  of  which  the  most  famous  result  is  the  Curry- 
Howard  isomorphism  between  natural  deduction  and  the  simply-typed  A-calculus  in  which  types 
correspond  to  sentences,  terms  correspond  to  proofs  (meanings,  in  the  Heyting  view),  and 
reduction  of  terms  corresponds  to  rewriting  of  proofs.  Girard  points  out  that  “the  fundamental 
idea  of  denotational  semantics  is  to  interpret  reduction  (a  dynamic  notion)  by  equality  (a  static 
notion)” . 

This  discussion  provides  some  insight  into  the  role  of  semantics  in  program  analysis.  Strictly 
speaking,  a  program  analysis  does  not  answer  questions  about  a  program,  it  answers  questions 
about  a  program’s  semantics.  This  is  a  rather  specialized  and  practical  application  of  semantics. 
As  Girard  points  out,  denotational  semantics  is  concerned  with  the  equality  of  programs — a 
notion  that  is  undecidable  for  most  languages.  One  of  the  practical  benefits  of  a  well-designed 
denotational  semantics  is  to  shed  light  upon  or  otherwise  aid  in  the  reasoning  about  program 
equivalence.  It  stands  to  reason,  then,  that  the  purpose  of  a  program  analysis  based  on  a 
denotational  semantics  must  be  to  provide  some  automatic  support  for  reasoning  about  program 
equivalence. 

Almost  as  soon  as  abstract  interpretation  arrived  on  the  scene,  to  make  a  connection  between 
program  analyses  and  the  semantics  of  programming  languages,  a  great  amount  of  effort  was 
spent  in  adapting  it  to  denotational  semantics.  For  some  examples,  see  [Myc81],  [Nie84],  [Nie86], 
and  [AH87].  As  one  would  expect,  this  body  of  work  offers  some  of  the  most  esthetically 
pleasing  formulations  of  program  analyses,  but  it  also  has  found  little  use  beyond  a  narrow 
range  of  applications  such  as  strictness  analysis  [BHA86]. 

In  general,  however,  one  would  like  a  program  analysis  to  produce  some  information  about 
a  dynamic  interpretation  of  a  program  rather  than  this  static  denotation.  This  is  why  most 


4.2  Modeling  a  Program  as  a  Transition  System 


69 


program  analyses  are  based  on  an  operational  semantics  that  describes  how  a  program  reduces 
dnring  execntion. 

This  is  also  perhaps  why  the  field  of  program  analysis  continnes  to  strnggle  for  acceptance  in 
programming- langnage  theory.  As  Girard  says,  operational  semantics  tend  to  be  “nncivilised” , 
and  despite  the  frameworks  of  strnctnral  operational  semantics  [PI08I]  (generalized  to  infinite 
behaviors  in  [CC92b])  and  natnral  semantics  [Kah87],  operational  semantics  does  not  have 
nearly  the  developed  and  refined  theory  of  denotational  semantics.  Even  worse,  despite  an 
effort  by  Schmidt  in  [Sch95]  to  begin  to  develop  a  snb-framework  of  abstract  interpretation  for 
natnral  semantics,  most  program  analyses  nse  what  Girard  calls  the  “ad  hoc  semantics  that 
crndely  paraphrase  the  steps  toward  normalisation”.  The  reason  is  that  a  program  analysis 
is  nsnally  designed  to  answer  qnestions  abont  “the  rnn-time  behavior”  of  a  program,  which 
reqnires  this  crnde  notion  of  operational  semantics:  ad  hoc  becanse  it  is  modeling  the  execntion 
of  a  program  on  some  kind  of  machine,  and  paraphrasing  normalization  steps  becanse  they  are 
precisely  the  steps  of  execntion  on  this  machine. 

Therefore,  whether  they  are  presented  in  this  manner  or  not,  most  nsefnl  program  analyses 
are  fonnded  npon  semantics  based  on  transition  systems.  These  semantics  mimic  the  execntion 
of  a  program  on  a  particnlar  abstract  machine  that  reflects  the  properties  of  interest.  In  onr 
work,  the  store  is  the  heart  of  snch  an  abstract  machine.  We  designed  the  store  with  this 
application  in  mind. 
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Typically,  a  dynamic  semantics  of  a  programming  langnage  mnst  model  two  components  of 
execntion:  data  and  control.  By  data,  we  mean  the  state  of  memory.  In  onr  framework,  the 
data  is  modeled  by  a  store.  By  control,  we  mean  the  state  of  the  code  itself.  For  instance,  the 
control  might  be  modeled  by  a  label  describing  the  position  in  the  code  that  is  schednled  to  be 
execnted  next. 

In  some  operational  semantics,  the  control  state  and  the  data  state  are  intertwined.  For 
instance,  in  context  semantics  [FF86],  a  state  of  execntion  is  simply  a  syntactic  term;  the  control 
state  is  encoded  in  as  the  next  redex  to  be  rednced,  and  the  data  state  is  modeled  with  syntactic 
constrncts  (snch  as  snbstitntion  or  the  heap  variables  in  [MFH95])  and  folded  into  the  term 
itself.  Bnt  mnch  of  program  analysis  is  concerned  with  analyzing  the  patterns  of  data  access 
dnring  execntion,  and  so  we  wish  to  keep  control  and  data  explicitly  separated. 

This  inspires  a  semantic  methodology  in  which  a  program  is  modeled  by  a  transition  system. 
As  described  in  Ghapter  2,  given  a  set  Var  of  variables  and  a  set  Val  of  valnes,  one  can  define 
the  set  Store  of  stores  that  model  the  instantaneons  states  of  data.  We  mnst  introdnce  a  new 
set  Ctrl  Point  of  control  points,  snch  as  labels,  that  model  the  instantaneons  syntactic  position 
of  execntion.  A  transition  system  is  then  a  tnple 

(CtrlPoint,  Var,  Val,  1 — >) 
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where 

I — ^  C  (CtrlPoint  x  Store)  x  (CtrlPoint  x  Store) 

is  a  single-step  binary  transition  relation  between  adjacent  control-store  pairs  in  an  execntion, 
where  Store  is  defined  from  Var  and  Val  as  in  Chapter  2: 


Store  =  Lval  — )•  Val 

Lval  =  Var  U  (Val  X  Val)  1-valnes 


In  other  words, 


(C,a) 


{cy) 


if  execntion  can  proceed  in  one  step  from  a  state  at  control  point  C  G  CtrlPoint  and  store 
cr  G  Store  to  a  state  at  control  point  C  and  store  o' . 

As  we  described  in  the  Chapter  2,  stores  have  rich  strnctnre  for  analysis.  A  control  point 
is  typically  mnch  simpler.  For  instance,  if  the  snbterms  (e.g.,  commands,  expressions)  in  a 
program  are  nniqnely  labeled,  then  a  control  point  often  can  be  simply  the  label  of  the  next 
snbterm  to  be  execnted.  Or  the  control  point  might  be  the  nnlabeled  snbterm  itself.  As  another 
example,  the  control  point  of  a  machine- langnage  program  is  the  valne  of  the  program  connter, 
while  the  rest  of  the  registers  and  memory  are  modeled  by  the  store. 

There  are  many  examples  of  other  kinds  of  transition  systems  as  models  of  programming 
langnages.  Most  of  these  systems  do  not  nse  onr  precise  notions  of  control  and  store,  bnt 
they  all  have  some  notion  of  a  control  state  and  a  data  state.  These  systems  are  also  called 
abstract  machines.  Onr  notion  of  a  store  is  expressive  enongh  to  encompass  all  of  these,  with 
the  appropriate  choice  of  the  set  Val  of  valnes. 

The  heart  of  a  transition  system  is  the  definition  of  the  transition  relation  i — )■.  Almost 
always,  i — )■  is  defined  by  a  set  of  meta-rules  that  define  how  each  occnrrence  of  a  certain  kind 
of  syntactic  term  in  the  program  indnces  a  family  or  families  of  transitions.  We  give  a  concrete 
example  of  this  in  Section  5.8.2,  and  the  rnles  in  that  section  are  indeed  qnite  standard.  Bnt  for 
the  rest  of  this  chapter  we  instead  describe  a  different  approach.  This  novel  approach  replaces 
the  traditional  meta-rnles  with  onr  compnter-representable  transfer  relations,  thns  opening  the 
door  to  a  wide  range  of  program-analysis  possibilities. 


4.3  Modeling  a  Program  as  a  Table  of  Transfer  Relations 

In  the  previons  section,  we  snggested  modeling  the  semantics  of  a  program  with  a  transition 
system.  We  explained  that  a  transition  system  is  a  tnple 

(CtrlPoint,  Var,  Val,  i — >) 


where 


C  (CtrlPoint  x  Store)  x  (CtrlPoint  x  Store) 
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is  a  single-step  binary  transition  relation  between  adjacent  control-store  pairs  in  an  execntion, 
where  Store  is  defined  from  Var  and  Val  as  in  Chapter  2. 

Observe,  however,  the  following  isomorphism: 

P((CtrlPoint  x  Store)  x  (CtrlPoint  x  Store))  ~  CtrlPoint  x  CtrlPoint  — )■  P(Store  x  Store) 
Therefore,  given  any  transition  relation 

I — ^  C  (CtrlPoint  x  Store)  x  (CtrlPoint  x  Store) 

one  can  view  i — )■  as  a  table  of  binary  relations  on  stores,  indexed  by  pairs  of  program  points. 

C  C' 

If  we  write  the  (C,  C)  entry  of  this  table  as  then  the  correspondence  is  as  follows. 

{C,a)  I — >  {C' ,a')  iff  cr  cr^ 

Now,  recall  that  transfer  relations  are  binary  relations  on  stores: 

TR  C  P(Store  x  Store) 

Bnt  not  all  binary  relations  on  stores  are  transfer  relations.  In  Section  2.4.1,  we  argned  that 
some  binary  relations  on  stores  are  not  “natnral” ,  in  that  they  will  not  occnr  in  any  reasonable 
programming  langnage.  This  notion  of  natnralness  motivated  the  design  of  onr  langnage  TR 
of  transfer  relations,  and  onr  claim  is  that  TR  is  indeed  rich  enongh  to  model  programming 
langnages. 

The  implications  of  that  claim  now  become  manifest.  We  now  claim  not  only  that  a  tran¬ 
sition  relation  i — )■  can  be  replaced  by  a  table  in 

CtrlPoint  x  CtrlPoint  — )■  P(Store  x  Store) 

of  binary  relations  on  stores,  bnt  that  it  can  indeed  be  replaced  by  a  table  in 

CtrlPoint  x  CtrlPoint  — )•  TR 

of  terms  in  onr  langnage  of  transfer  relations,  again  indexed  by  the  control  points  before  and 
after  the  transition.  Ultimately,  this  is  more  of  a  philosophical  claim  than  a  provable  statement. 
The  claim  is  that  the  langnage  TR  of  transfer  relations  is  expressive  enongh  to  model  all  possible 
store  changes  that  may  arise  as  single  execntion  steps  of  any  reasonable  programming  langnage. 
To  snpport  this  claim,  we  will  demonstrate  in  fntnre  chapters  that  TR  is  indeed  expressive 
enongh  to  model  a  wide  variety  of  programming- langnage  constrncts. 

Given  this  claim,  one  may  replace  any  transition  system 

(CtrlPoint,  Var,  Val,  i — >) 


by  a  tnple 


(CtrlPoint,  Var,  Val,  Primop,  A) 
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where 

A  G  CtrlPoint  x  CtrlPoint  — ^  TR 

and  the  set  TR  of  transfer  relations  is  defined  as  in  Chapter  2  from  the  sets  Var,  Val,  and  Primop. 
Now,  the  heart  of  a  semantic  definition  of  a  programming  langnage  is  not  a  set  of  meta-rnles 
defining  a  transition  relation  i — )■,  bnt  is  instead  a  description  of  how  to  map  a  program  in 
the  langnage  to  a  table  A  of  transfer  relations,  one  transfer  relation  for  each  pair  of  control 
points  in  the  program,  describing  the  single  steps  of  program  execntion.  By  convention,  we 
write  A.{C,C')  as 

and  call  these  transfer  relations  the  single-step  transfer  relations  of  the  program. 

There  are  |CtrlPointp  single-step  transfer  relations.  As  described  above,  CtrlPoint  is  typically 
a  set  of  pointers  into  the  text  of  a  program  P,  and  so  |CtrlPoint|  will  nsnally  be  linear  with  the 
size  of  P.  If  that  size  is  n,  then  this  means  that  there  are  O(n^)  single-step  transfer  relations 
in  the  semantics  of  P.  However,  the  vast  majority  of  these  will  be  the  empty  relation,  becanse 
transitions  between  most  pairs  of  control  points  is  impossible.  For  instance,  in  straight-line 
code,  the  only  possible  transitions  are  between  adjacent  control  points,  and  so  there  are  only 
0(n)  non-empty  single-step  transfer  relations,  as  one  wonld  expect. 

Note  that  transfer  relations  thns  encode  control-flow  information  abont  the  program.  If 
^C,C'  =  0  then  it  is  not  possible  for  the  program  in  qnestion  to  take  a  single  step  from  control 
path  C  to  control  path  C .  Similarly,  if  ^c,c'  =  e?  A  then  a  single  step  from  C  to  C  is 
possible  only  from  stores  at  C  in  which  e  evalnates  to  true. 

It  is  interesting  to  note  that  each  non-empty  single-step  transfer  relation  replaces  an  infinite 
family  of  transitions.  This  is  demonstrated  with  the  following  simple  example. 

Example  19  Suppose  the  transfer- relation  semantics  of  program  P  is  the  tuple 

(CtrlPoint,  Var,  Val,  Primop,  A) 

and  that  Ac,c'  =  x  i— )•  y  +  1  for  some  C,  C  G  CtrlPoint.  Then  the  transition-system  semantics 

(CtrlPoint,  Var,  Val,  i — >) 
of  P  includes  the  family  of  transitions 

{C,a)  I — ^  (C",  ((t[x  (cry) -f  1])) 

where  a  G  Store  is  any  store. 

4.4  Composing  Single-Step  Transfer  Relations 

Given  the  single-step  transfer  relations  A  for  a  program,  one  can  nse  the  transfer-relation 
composition  algorithm  ©  from  Chapter  3  to  compnte  the  transfer  relation  for  any  finite  control 
path  in  a  program. 
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4.4.1  Two-step  transition  sequences 

Suppose  the  transfer-relation  semantics  of  program  P  is  the  tuple 

(CtrlPoint,  Var,  Val,  Primop,  A) 

and,  following  our  convention  of  notation,  we  write  A(C',  C)  as  ^c,c'  ■ 

The  single-step  transfer  relation  Ac,c'  defines  all  of  the  transitions  from  control  point  C 
to  control  point  C .  The  single-step  transfer  relation  ^c'  ,C"  defines  all  of  the  transitions  from 
control  point  C'  to  control  point  C” .  Therefore,  the  relation 

defines  exactly  the  two-step  transition  sequences  from  C  through  C'  to  C”. 

Suppose  that  all  primitive  operations  p  G  Primop  are  deterministic,  as  will  be  the  case  in 
most  programming  languages,  and  that  we  are  given  the  symbolic  evaluation 

P  E  Primop  — )■  Exp*  — >  ATR  — >  Exp 

as  defined  by  Definition  5  and  translation 

C  E  Exp  TR  ^  TR  TR 

as  defined  by  Definition  8.  Then  we  know  from  Theorem  4  that 

(Ac,c'  ®  Ac",c")  G  TR 

is  equivalent  to  Ac^c]  and  thus  defines  exactly  the  two-step  transition  sequences  from  C 

through  C'  to  C”.  But  the  profound  advantage  of  Aq^c'  ®  over  A(7,c';  is  that, 

syntactically,  the  former  is  a  (computer-representable)  term  in  our  language  TR  of  transfer 
relations,  while  the  latter  is  not.  The  advantage  of  this  is  illustrated  in  the  following  example, 
which  also  serves  to  remind  why  we  disallowed  explicit  syntactic  composition  of  transfer  relations 
in  the  language  TR. 

Example  20  Suppose 

Ac,c'  = 

Ac',c"  = 

Suppose  also  that  all  primitive  operations  p  E  Primop  are  deterministic,  and  {+, -}  C  Primop. 
Assuming  an  arbitrary  C  (because  it  will  not  be  used  here)  and  the  trivial  P  that  is  the  identity 
function  on  context- independent  operations  (such  as  +  and  -),  we  have  that 


X  1-^  y  +  1 


X  I— )•  X  -  1 


X  1-^  y  +  1 

® 

X  1— )■  X  -  1 

= 

X  (y  +  1)  -  1 
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describes  exactly  the  possible  net  effects  of  the  two-step  fragment  of  execution  from  control  point 
C  through  C'  to  C" .  If  P  instead  performs  some  better  symbolic  evaluation,  ©  may  yield  a 
better  result  such  as 

X  H-  y 

Either  way,  it  is  more  enlightening  as  to  the  behavior  of  this  execution  fragment  than  the  term 


X  1-^  y  +  1 


X  I— )■  X  -  1 


By  convention  we  write  A.c,c'  ®  ,C"  as  A.c,c' ,C" 

Snppose  that  there  are  nondeterministic  primitive  operations  in  Primop.  Then  from  Theo¬ 
rem  3  we  have  that  I^c,C'  ®  ^C',C"  is  a  snperset  of  ^c,C']  ,C"  and  thns  defines  at  least  all 

of  the  two-step  transition  seqnences  from  C  throngh  C'  to  C”.  It  might,  however,  relate  some 
initial  stores  (i.e.,  at  C)  to  some  final  stores  (i.e.,  at  C")  that  cannot  be  achieved  by  snch  a 
two-step  transition  seqnence. 

We  now  generalize  the  above  to  arbitrary-length  seqnences  of  transitions. 


4.4.2  Arbitrary-length  transition  sequences 

Single  transitions  involve  two  control  points,  an  initial  and  a  final.  Two-step  transition  se¬ 
qnences  involve  three  control  points,  an  initial,  a  middle,  and  a  final.  We  generalize  this 
concept  with  the  following  definition. 

Definition  10  Given  a  set  Ctrl  Point  of  control  points,  a  control  path  is  a  sequence  of  one 
or  more  control  points.  The  set  of  control  paths  is  written  CtrlPoint"*".  If  ,  G  CtrlPoint"*" 
and  ,  '  G  CtrlPoinf*"  then  ,  , ,  ^  G  CtrlPoinf*"  is  their  concatenation.  For  any  ,  (E  CtrlPoint"'', 
,  '  G  CtrlPoint"'',  and  C  G  CtrlPoint,  (,  ,  C);  {C,,  ')  =  ,  ,  (7, ,  '. 

A  transfer-relation  semantics  provides  the  single-step  transfer  relations  of  a  program  that 
define  all  of  the  program’s  valid  transitions.  Transitions  are  merely  execntion  seqnences  throngh 
control  paths  of  length  two.  Concatenating  adjacent  transitions,  or  eqnivalently,  composing 
adjacent  transfer  relations,  prodnces  the  execntion  seqnences  throngh  control  paths  of  length 
three.  Another  composition  covers  the  control  paths  of  length  fonr,  and  so  forth. 

Therefore  we  define  from  the  finite  collection  of  single-step  transfer  relations  Aq^c  the 
infinite  collection  of  transfer  relations  Ap  for  all  control  paths  ,  of  length  at  least  2: 

Ar^p/  =  Ap  ©  Ap/ 

Note  that  this  definition  is  nondeterministic,  becanse  there  are  n  —  2  ways  to  split  np  a  length-n 
control  path  ,  ”  into  ,  and  ,  '  snch  that  becanse  any  control  point  in  the  path  ,  ” 

other  than  the  end  points  can  act  as  the  “pivot  point”.  In  fact,  the  choice  of  pivot  point  will 
in  general  prodnce  different  syntactic  transfer  relations.  Bnt  if  ©  is  a  translation,  which  by 
Theorem  4  will  be  the  case  if  all  primitive  operations  p  G  Primop  are  deterministic,  then  by 
associativity  of  relation  composition,  all  of  these  transfer  relations  are  semantically  eqnivalent. 
If  on  the  other  hand  ©  is  merely  an  npper  approximation,  then  they  may  not  all  be  semantically 
eqnivalent,  bnt  they  are  all  snpersets  of  the  trne  composition. 
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4.5  Treatment  of  Errors 

There  are  three  ways  in  which  an  implementation  of  a  programming  langnage  treats  an  error. 

1.  The  error  may  be  canght  at  compile  time.  For  instance,  most  langnages  with  static  typing, 
snch  as  ML,  will  prevent  at  compile  time  all  attempts  to  add  an  integer  and  a  boolean 
valne. 

2.  An  error  not  canght  at  compile  time  may  be  canght  at  rnn  time.  For  instance,  ML’s  static 
type  system  cannot  detect  ont-of-bonnd  array  references.  Instead,  all  array  references 
perform  a  test  a  rnn  time  to  check  if  the  index  is  within  bonnds  of  the  array,  and  raise 
an  exception  if  the  test  fails. 

3.  Finally,  an  error  may  not  be  canght  at  all.  In  this  case,  the  semantics  of  the  error  is 
nnspecified,  and  the  execntion  is  allowed  to  “rnn  wild”  after  the  error.  In  ML,  no  errors 
reach  this  stage;  they  are  all  canght  either  at  compile  time  or  at  rnn  time.  Bnt  in  C, 
for  instance,  it  is  possible  to  extract  a  valne  from  an  nninitialized  local  variable,  bnt  the 
definition  of  C  does  not  specify  this  valne.  Also,  one  may  cast  any  integer  into  a  pointer 
and  attempt  to  write  into  that  address  in  memory.  Depending  on  the  implementation,  this 
can  prodnce  a  wide  range  of  errors  that  are  impossible  to  catch  at  rnn  time.  For  instance, 
the  address  of  a  pointer  may  happen  to  correspond  to  a  local  variable  on  the  stack,  and 
so  any  write  into  a  pointer  changes  the  valne  of  the  variable.  Similarly,  writing  past  the 
bonndary  of  a  strnct  or  array  may  interfere  with  other  data  strnctnres.  An  even  more 
dramatic  example  of  bad  behavior  resnlting  from  nncanght  errors  is  the  overwriting  of 
code  by  bad  pointers  or  ont-of-bonnd  array  references.  The  semantics  of  C  is  nnspecified 
for  snch  programs;  once  snch  an  error  occnrs,  the  execntion  may  rnn  wild.  Depending 
on  the  implementation,  the  execntion  may  proceed  in  an  nnpredictable  manner  or  may 
violate  the  rnn-time  system,  cansing  a  segmentation  fanlt  or  bns  error. 

We  discnss  the  way  in  which  onr  methodology  addresses  these  three  kinds  of  errors. 

1.  Becanse  onr  methodology  is  appropriate  only  for  a  dynamic  semantics  of  a  langnage  (in 
other  words,  the  rnn-time  behavior)  and  not  for  a  static  semantics  (for  instance,  the 
static  type  system),  we  do  not  address  errors  canght  at  compile  time.  We  assnme  that 
the  static  semantics  has  already  canght  these  errors  and  provided  the  dynamic  semantics 
with  a  program  that  is  free  of  these  errors.  A  dynamic  semantics  may,  incidentally, 
provide  a  model  for  programs  that  contain  compile-time  errors,  bnt  it  does  not  matter 
what  this  model  is.  For  example,  in  Chapters  5  and  6,  we  will  nse  onr  methodology  to 
model  langnages  with  records.  These  semantics  will  not  model  the  type  of  a  record  (in 
other  words,  the  names  and  types  of  its  fields),  and  thns  allow  type-nnsafe  nses  of  the 
record  (for  instance,  an  attempt  to  read  a  non-existent  field).  These  semantics  do  provide 
a  model  for  snch  type-nnsafe  operations,  bnt  it  is  expected  that  a  static  semantics  will 
ensnre  that  they  will  never  occnr. 
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2.  Because  our  methodology  is  for  the  design  of  a  dynamic  semantics,  modeling  the  run-time 
behavior  of  programs,  it  should  be  able  to  provide  a  treatment  of  errors  caught  at  run¬ 
time.  Typically,  when  a  run-time  error  is  detected,  control  proceeds  to  an  error  handler. 
For  instance,  a  run-time  error  in  ML  raises  an  exception  which  is  caught  by  either  a 
user-defined  handler  or  the  run-time  system’s  top-level  handler.  Nothing  prevents  our 
methodology  from  dealing  with  run-time  errors  in  a  similar  fashion.  Suppose  that  (C,  a) 
is  a  state  in  which  a  run-time  error  occurs.  For  instance,  in  an  ML  program,  control  point 
C  may  reference  code  to  take  the  head  of  a  list  object  in  store  a,  and  that  object  is  nil. 
Then  there  will  be  a  transition 

{C,a)  ^  (C',a') 

where  C'  is  the  entry  of  some  error  handler,  which  in  the  case  of  ML  will  be  an  exception 
handler. 

However,  in  this  dissertation  we  will  not  give  any  examples  of  such  run-time  errors.  In 
other  words,  the  languages  in  Chapters  5,  6,  and  7  do  not  perform  any  run-time  checks. 

3.  We  assume  that  if  an  error  is  not  caught  at  compile  time  or  run  time,  then  the  run-time 
behavior  of  the  program  after  the  error  occurs  is  unspecified.  Therefore,  our  methodology 
treats  these  errors  in  the  same  way  as  it  treats  compile-time  errors:  we  assign  a  behavior 
to  a  program  that  exhibits  such  an  error,  but  the  particular  behavior  is  unimportant 
because  conceptually  the  semantics  is  unspecified  in  that  case. 

Generally,  the  value  undef  may  come  about  as  the  result  of  errors  that  are  allowed  to 
“run  wild”  and  thus  have  unspecified  run-time  behavior — in  other  words,  the  the  first  and  third 
categories  above.  For  instance,  in  Chapters  5,  6,  and  7  we  will  give  a  semantic  model  in  which  an 
attempt  to  lookup  the  value  of  an  unbound  variable  or  field  of  a  data  structure  results  in  undef, 
and  primitive  operations  are  defined  on  rmdef .  For  instance,  (1  +  undef)  evaluates  to  undef, 
and  (rmdef  =  undef)  evaluates  to  true.  The  latter  may  seem  odd,  but  is  perfectly  reasonable 
because,  once  again,  the  run-time  behavior  of  errors  that  produce  undefined  is  unspecified. 
Essentially,  we  need  only  model  the  run-time  behavior  of  programs  that  do  not  exhibit  any 
errors  in  the  first  and  third  categories  above. 

It  is  worth  commenting  on  a  phenomenon  with  transfer  relations  that  should  not  be  confused 
with  the  treatment  of  errors.  Suppose  control  path  ,  begins  with  control  point  C.  Given  the 
transfer  relation  Ar  corresponding  to  control  path  ,  ,  and  given  a  store  cr,  if  there  is  no  a' 
such  that  cr  Ap  cr',  then  it  means  that  execution  from  the  state  (C,  cr)  cannot  progress  through 
control  path  ,  .  For  instance,  C  may  be  a  branching  point,  with  path  ,  proceeding  down  the 
branch  for  when  a:  >  0,  but  the  value  of  a:  in  cr  is  not  greater  than  0.  This  is  not  intended  to 
be  a  way  of  modeling  errors  that  may  have  occurred  during  control  path  ,  . 
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A  Case  Study:  The  Language  MiNi-C 


In  this  chapter,  we  present  MlNl-C,  a  simple  imperative  langnage  with  while  loops,  assign¬ 
ment,  mntable  records,  and  immntable  tnples,  bnt  withont  procednres  or  arrays.  We  also 
give  a  semantics  for  MlNl-C  in  terms  of  a  transition  system.  As  snggested  in  Chapter  4,  we 
will  demonstrate  two  different  techniqnes  for  defining  that  transition  system — the  traditional 
approach  of  meta-rnles  and  onr  new  approach  nsing  single-step  transfer  relations. 

The  main  pnrpose  of  this  chapter  is  to  develop  a  relatively  straightforward  case  stndy  of 
onr  approach  to  program  analysis. 


5.1  Syntax 


A  Mini-C  program  is  a  list  of  zero  or  more  statements.  A  statement  is  either  an  assignment, 
an  allocation  of  a  new  record  with  n  named  fields,  a  conditional  with  a  statement  list  for  each 
branch,  or  a  while  loop  with  a  statement  list  for  a  body. 

S  ::=  {si, . . . ,  (ordered)  statement  list  {n  >  0) 

s  ::=  L:=E  assignment  statement 

I  L:  =  {/i  =  El, . . . ,  fn  =  En}  mntable- record  allocation 

I  if  E  then  S  else  S'  conditional  statement 

I  while  E  do  S  while  loop 

G 


/ 


Field 


mntable-record  field  names 
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We  define  the  source  expressions  and  source  I- expressions  slightly  differently  from  the  expres¬ 
sions  and  1-expressions  in  Chapter  2. 

E  ::=  L  location  looknp 

I  P{Ei, . . . ,  Eji)  primitive  application 

L  ::=  X  variable  location 

I  E.f  data  snbcomponent  location 

X  G  Var  variables 

Finally,  the  source  primitive  operations  are 


c 

constants  (nnllary) 

+  -  * 

integer  operations  (binary  and  nnary  -) 

<>=<>& 

boolean  operations  (binary) 

if 

conditional  expression  (ternary) 

tuple 

immntable-tnple  constrnction  (n-ary) 

TTj 

immntable-tnple  component  selection  (nnary) 

where  the  set  Constant  of  constants  is 

c  ::=  n  integers  (Int) 

I  true  I  false  booleans 

All  of  the  sonrce  primitives  are  simple  (i.e.,  deterministic  and  context-independent).  We  leave 
the  set  Field  of  field  names  of  mntable  data  strnctnres  open  for  the  moment.  Note  that  constants 
are  nnllary  primitive  operations,  as  snggested  in  Chapter  2.  We  will  sometimes  nse  nil  as 
another  name  for  false.  (One  conld  jnst  as  easily  add  nil  as  another  constant.) 

We  adopt  the  following  syntactic  conventions. 

•  The  statement  list  {s}  may  be  written  as  s.  item  The  expression  c()  may  be  written  as  c 
(onr  nsnal  convention). 

•  The  expression  P{E,  E')  may  be  written  in  infix  as  E  P  E'. 

•  The  expression  tuple (£^i, . . .  ,En)  may  be  written  as  {Ei, . . .  ,En). 

5.2  Discussion 

Mini-C  does  not  have  procednres,  bnt  its  data  featnres  and  imperative  featnres  of  are  similar 
to  C.  Allocating  a  MlNl-C  record  with  n  fields  corresponds  to  calling  the  C  malloc  operation 
to  allocate  a  size-n  block  of  memory  on  the  heap  and  then  immediately  filling  the  block  with 
n  valnes.  Thns,  MlNl-C  field  names  correspond  to  strnctnre  field  names  in  C  as  well  as  the  * 
token  for  pointers. 


5.3  Simplification  of  Syntax 
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For  simplicity,  MlNI-C  does  not  have  arrays,  which  add  an  extra  element  of  complexity  in  two 
ways.  One,  they  may  be  of  statically  nnknown  size.  Two,  the  index  (which  wonld  correspond  to 
the  field  name  of  a  record)  of  an  array  dereference  may  also  be  statically  nnknown.  In  Chapter  7 
we  will  develop  a  sonrce  langnage  with  arrays. 

One  featnre  of  C  that  is  not  present  in  MlNl-C  is  low-level  control  over  data  layont.  C’s 
&  operator  is  not  in  MlNl-C  becanse  there  is  no  distinction  in  MlNl-C  between  an  npdatable 
data  strnctnre  and  a  pointer  to  that  strnctnre.  Intnitively,  all  npdatable  data  strnctnres  in 
MlNI-C  are  pointers,  in  mnch  the  same  way  that  all  arrays  in  C  are  pointers.  In  contrast,  C 
provides  a  mechanism  for  distingnishing  a  strnct  itself  from  a  pointer  to  that  strnct;  this  is 
nsefnl  for  programmer  control  of  data  layont — for  instance,  the  inline  allocation  of  a  strnct  as  a 
field  in  another  strnct.  Fnrthermore,  there  is  no  pointer  arithmetic  in  MlNI-C,  and  nor  is  there 
a  notion  of  casting  one  npdatable  data  strnctnre  to  another.  All  of  those  featnres  of  C  exist 
to  give  the  programmer  low-level  control  of  data  layont.  In  this  dissertation,  we  will  not  cover 
snch  issnes,  and  so  MlNl-C  does  not  inclnde  those  langnage  featnres. 

However,  it  is  in  fact  possible  to  model  a  fnnctionality  similar  to  the  C  &  operator,  as 
well  as  pointer  arithmetic  for  an  extension  of  MlNI-C  with  arrays.  We  discnss  this  fnrther  in 
Section  5.9. 


5.3  Simplification  of  Syntax 

Above  we  presented  the  syntax  of  MlNl-C  in  a  form  that  is  intended  to  be  nsed  by  a  programmer. 
Bnt  it  will  be  mnch  more  convenient  to  describe  the  semantics  of  MlNI-C  programs  if  we  first 
recast  the  syntax  in  a  form  that  more  closely  fits  onr  development  of  transfer  relations  in  the 
previons  chapters. 

Onr  first  task  is  to  recast  sonrce  expressions  and  sonrce  1-expressions  respectively  into  the 
expressions  and  1-expressions  of  the  transfer-relation  langnage.  We  recall  their  definitions  here. 

e  G  Exp  ::=  a:  |  p(ei, . . . ,  e„) 

I  G  Lexp  ::=  x  \  e.e' 

p  G  Primop 

X  G  Var 

First  of  all,  we  inclnde  all  of  the  sonrce  primitive  operations  P  in  the  set  Primop.  For  the 
pnrpose  of  modeling  the  semantics  of  MlNl-C  with  transfer  relations,  we  will  have  to  add  some 
operations  to  Primop  that  are  not  available  to  the  nser  at  sonrce  level.  First  of  all,  we  need  to 
add  the  field  names  Field  to  Primop  as  nnllary  primitive  operations;  then  the  sonrce  1-expression 
E.  f  is  a  member  of  Lexp. 

Secondly,  we  need  to  add  the  context-dependent  binary  primitive  operation  deref  of  Chap¬ 
ter  2  to  Primop  in  order  to  consider  the  sonrce  expression  E.f  to  be  the  expression  deref  (£^,  /)  G 
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Exp.  Now  all  source  expressions  E  are  in  Exp,  and  all  source  1-expressions  L  are  in  Lexp;  hence¬ 
forth  we  will  thus  call  them  expressions  and  1-expressions  and  use  the  metavariables  e  and  I, 
respectively. 

Our  second  transformation  is  to  “compile”  a  statement  allocating  a  record  with  n  fields  into 
a  statement  that  allocates  new  memory  followed  by  n  statements  that  fill  the  n  fields  with  their 
values.  For  this  purpose,  we  need  the  following,  as  suggested  in  Chapter  2: 

•  for  each  natural  number  m,  a  pointer  value  (m)  G  Val 

•  the  unary  primitive  operation  ptr  that  casts  an  integer  m  to  the  pointer  (m)  (formally, 

ptr(m)  ^  where  we  write  (e)  for  ptr(e) 

•  a  distinguished  variable  H,  initialized  to  1  at  the  beginning  of  execution,  to  hold  the  integer 
of  the  next  free  pointer  on  the  heap 

We  now  perform  the  following  transformation  of  a  MlNI-C  program  S.  We  assume  without  loss 
of  generality  that  H  does  not  appear  in  S.  We  first  add  the  assignment  statement 

H:=l 

Then  we  rewrite  every  allocation  statement 

{/l  —  fill  •  •  •  5  fn  —  6n} 

C={H); 

(H)./i:=ei; 

{^)-fn  •—  Cn  j 

H:=H  +  1 

Our  resulting  program  is  now  in  the  simplified  language 

S  ::=  {si, . . . ,  (ordered)  statement  list  {n  >  0) 

s  ::=  l:=e  assignment  statement 

I  if  e  then  S  else  S'  conditional  statement 

I  while  e  do  S'  while  loop 

where  I  G  Lexp,  e  G  Exp,  and  Primop  includes  the  source  primitive  operations  as  well  as  deref , 
ptr,  and  /  for  all  /  G  Field. 

We  now  wish  to  design  a  transition  system  to  describe  the  semantics  of  MlNl-C  programs 
expressed  in  this  simplified  language.  Recall  that  a  transition  system  is  a  tuple 

(CtrlPoint,  Var,  Val,  i — >). 

We  already  have  the  set  Var  of  variables;  it  is  one  of  the  syntactic  domains  of  MlNl-C.  All  we 
have  mentioned  about  the  other  three  components  is  that  Val  includes  the  collection  of  pointers 
{n).  We  now  describe  each  of  these  remaining  three  components  in  turn. 


to  the  beginning  of  statement  list  S. 

l:= 

as  the  sequence 


5.4  Control  Points 
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5.4  Control  Points 

In  order  to  define  a  transition  system  describing  the  semantics  of  MlNI-C  programs,  we  mnst 
design  a  way  to  refer  to  the  control  points  in  a  program.  The  control  points  are  merely  the 
statements  in  the  program,  and  execntion  proceeds  from  control  point  to  control  point  as  it 
processes  the  statements  in  order. 

A  control  point  of  a  program  is  an  “index”  into  the  syntax  tree  of  the  program.  Formally, 
a  control  point  is  a  finite  seqnence  of  natnral  nnmbers. 

C  G  CtrlPoint  =  Nat* 

The  empty  seqnence  is  written  e.  If  C  G  CtrlPoint,  then  C.i  G  CtrlPoint  and  i.C  G  CtrlPoint 
respectively  represent  the  extensions  of  the  seqnence  C  on  the  right  and  on  the  left  by  i  G  Nat. 
Intnitively,  the  nnmbers  in  a  control  point  describe,  from  left  to  right,  how  to  descend  into 
the  syntax  tree  of  a  program.  Formally,  this  is  given  below,  where  S'[C']  retnrns  the  statement 
within  statement  list  S  at  control  point  C. 

{si,  .  .  .  ,  —  Si 

{si, . . . ,  Sn}[i-j-C]  =  S'j[C']  if  Sj  =  if  e  then  Si  else  S'2 
{si, . . . ,  Sn}[i-C]  =  S'[C']  if  Si  =  while  e  do  S' 

Example  21  The  following  MlNI-C  program  is  annotated  with  its  control  points. 

{ 

if  n  <  0  then 
n:=-(n) 
else 

n:=n  *  2; 
r:=l; 

while  n  >  1  do 

{ 

r  :=  r  *  n; 
n:=n  -  1 

}; 

if  r  >  60  then 
r  :=r  -  1; 

X  :=  r 

} 

Recall  that  a  one-armed  conditional  if  e  then  S  is  an  abbreviation  for  if  e  then  S  else  {}. 

Note  that  a  traversal  into  a  conditional  statement  extends  the  control  point  by  two  nnmbers — an 
index  identifying  one  of  the  two  branches  and  an  index  into  the  statement  list  in  that  branch — 
while  a  traversal  into  a  while  loop  extends  the  control  point  by  only  a  single  nnmber — an  index 
into  the  loop  body. 


1 

1.1.1 

1.2.1 

2 

3 

3.1 

3.2 

4 

4.1.1 

5 
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5.5  Values 

As  described  above,  the  set  Var  of  variables  is  already  given  as  one  of  the  syntactic  domains  of 
Mini-C.  So  the  next  stage  of  the  design  of  the  transition  system  is  the  set  Val  of  valnes.  Recall 
that  the  states  in  the  system  are  pairs 


CtrlPoint  x  Store 

where  stores  are  defined  as  in  Chapter  2: 

Store  =  Lval  — )■  Val 

Lval  =  Var  U  (Val  X  Val)  1- valnes 

We  thns  need  to  design  an  appropriate  set  of  valnes  for  this  store. 

5.5.1  Constants,  field  names,  and  pointers 

Recall  the  set  Constant  of  MlNl-C  constants: 

c  ::=  n  integers  (Int) 

I  true  I  false  booleans 

All  constants  are  valnes. 

In  Section  5.3  we  performed  a  syntactic  transformation  where  field  names  were  considered 
as  nnllary  primitive  operations.  Therefore,  we  inclnde  the  set  Field  in  the  set  of  valnes. 

In  the  same  section,  we  snggested  the  approach  of  nsing  pointer  valnes  to  model  the  roots 
of  mntable  records.  As  we  described,  there  is  a  pointer  (m)  for  every  natnral  nnmber  m.  The 
set  of  all  pointers  is  denoted  Pointer. 

5.5.2  Immutable  ordered  tuples 

Recall  that  MlNI-C  inclndes  the  n-ary  primitive  operation  tuple  for  tuple  construction  and 
the  operations  tt*  for  tuple-component  selection.  Therefore,  we  would  like  the  set  of  values  to 
include  all  ordered  tuples  of  values.  The  ordered  tuple  of  the  n  values  vi,. . .  ,Vn  is  written 

(-^1,  .  .  .,Vn). 

5.5.3  The  undefined  value  undef 

As  we  described  in  Chapter  2,  we  demand  that  the  set  Val  of  values  include  the  distinguished 
token  undef  representing  the  “undefined  value”.  As  we  explained,  this  requirement  comes  from 
the  fact  that  stores  are  total  functions  from  1-values  to  values,  and  thus  require  such  an  explicit 
representation.  For  instance,  at  any  given  point  in  a  MlNI-C  program,  it  is  reasonable  for  only 
a  small  set  of  variables  to  be  defined,  and  the  store  at  that  point  would  map  all  other  variables 
to  rmdef . 


5.6  Semantics  of  Primitive  Operations 
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5.5.4  The  set  of  values 


Above,  we  described  the  different  kinds  of  valnes  in  MlNl-C.  The  set  Val  is  their  disjoint  nnion 
(in  order  to  distingnish  pointers  from  natnral-nnmber  constants  and  to  distingnish  a  valne  from 
its  nnary  tnple).  It  is  defined  indnctively  as  the  smallest  set  satisfying  the  following  eqnation. 


V  6;  Val  =  Constant  +  Field  +  Pointer  +  Val*  +  {undef} 


5.6  Semantics  of  Primitive  Operations 


The  semantics  of  primitive  operations  follows  the  methodology  described  in  Chapter  2.  To 
review,  the  phrase  p(vi, . . .  ,Vn)  v  means  that  the  n-ary  primitive  operation  p  applied  to 
valnes  (r;i, . . . ,  Vn)  in  store  a  evalnates  to  valne  v.  However,  the  only  primitive  operation  whose 
evalnation  depends  on  a  (in  other  words,  the  only  context-dependent  operation,  as  defined  in 
Definition  2),  is  deref,  so  we  omit  the  a  parameter  for  all  other  operations. 

Recall  from  Condition  1  that  for  any  n-ary  primitive  operation  p  G  Primop,  for  any  n  valnes 
vi,. . .  ,Vn  G  Val,  and  for  any  store  a  G  Store,  there  is  at  least  one  valne  r;  G  Val  snch  that 
p(vi, . . .  ,Vn)  V.  In  other  words,  all  primitive  operations  mnst  be  defined  everywhere.  All 
primitive  operations  in  MlNI-C  are  also  deterministic,  as  defined  in  Definition  1,  this  means 
that  they  evalnate  to  only  one  valne.  Therefore,  all  MlNl-C  primitive  operations  are  total 
fnnctions. 

We  define  the  primitive  operations  in  MlNl-C  below.  We  already  gave  most  of  these  defini¬ 
tions  in  Chapter  2.  A  primitive  operation  evalnates  to  undef  nnless  otherwise  defined  below. 
First,  we  have  the  constants  and  field  names. 

c()  ^  c 

/()  / 

Now  the  integer  operations 

+  {n,n')  ^  {n  +  n') 

-(n,n')  ^  (n  —  n') 

-(n)  ^  —n 

^  {n  X  n') 


*(n,  n') 
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Next,  the  boolean  operations  and  if. 


&(true,  v) 

V 

k(v, true) 

V 

&(f alse, v) 

false 

k(v,  false) 

false 

<(n,  n') 

(n  <  n' 

>(n,  n') 

(n  >  n' 

=  {v,v') 

{v  =  v') 

A 

V 

(v  ^  vj 

if  (true,  V,  v') 

V 

if  (false,  V,  v') 

v' 

Next,  the  operations  for  immntable  tnples. 

t\iple{v,. . .  ,v')  ^  {vi,...,Vn) 

.  .  ,Vn))  ^  Vi 

Finally,  we  have  the  operations  to  snpport  mntable  records. 

ptr(n)  ^  (n) 

deref{v,v')  a{v.v')  (context-dependent) 


5.7  Semantics  of  Expressions  and  L-expressions 


The  semantics  of  expressions  and  1-expressions  are  precisely  the  same  as  in  Chapter  2.  We 
review  that  definition  here.  Formally,  the  interpretations  of  expressions  and  1-expressions  are 
given  by  the  following  relations. 

•  The  phrase  I  her  w  means  that  the  1-expression  I  evalnates  in  store  a  to  1-valne  w. 

•  The  phrase  e\-(j  v  means  that  the  expression  e  evalnates  in  store  a  to  valne  v. 


We  recall  the  following  rnles,  which  indnetively  define  these  relations. 


x\-„  X 


e\-(j  V  e'  ho-  v' 
{e.e')  her  {v.v') 


X  her  (era:) 


ei\-aVi  p{vi,...,Vn)  ^gV 

p(ei,  ...,en)\-aV 


We  recall  Lemma  1,  that  all  expressions  (respectively,  1-expressions)  evalnate  to  at  least  one 
valne  (respectively,  1-valne).  In  addition,  we  know  from  Lemma  2  that  becanse  Primop  is 
deterministic  that  this  valne  (respectively,  1-valne)  is  nniqne. 
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5.8  Transition-system  Semantics 

In  this  section  we  present  a  transition  system  that  models  the  execntions  of  MlNI-C  programs. 
We  consider  two  ways  to  define  snch  a  system. 

•  The  nsnal  approach  is  to  give  a  meta-rnle  for  each  kind  of  syntactic  form  in  the  langnage. 
There  are  one  or  more  meta-rnles  for  each  syntactic  form  in  the  langnage  (for  instance, 
conditional,  assignment,  and  so  forth).  These  meta-rnles  describe  how  any  occnrrence  of 
that  syntactic  form  in  the  sonrce  program  indnces  single-step  transitions.  The  single-step 
transitions  for  the  entire  program  is  the  collection  (nnion)  of  all  of  these  transitions. 

•  The  second  approach  is  technically  eqnivalent,  bnt  nses  the  framework  that  we  developed 
in  Chapters  2,  3,  and  4,  thns  opening  np  all  the  possibilities  of  onr  program- analysis 
methodology  for  the  langnage.  The  idea  is  that  each  rnle  introdnced  by  a  meta-rnle  in 
the  above  approach  is  eqnivalent  to  a  single  transfer  relation  that  describes  all  of  the 
possible  transitions  indnced  by  that  rnle.  Hence,  the  approach  is  to  give  a  rnle  for  each 
kind  of  syntactic  form  in  the  langnage  that  describes  how  any  occnrrence  of  that  form 
indnces  a  transfer  relation  describing  all  of  the  possible  single-step  transitions  for  that 
occnrrence. 

We  will  illnstrate  each  of  these  approaches  in  tnrn  for  MlNl-C.  Bnt  first,  we  need  a  helper 
fnnction  to  manage  control  points. 

5.8.1  The  next  function 

In  most  langnages,  mnch  of  the  control  flow  is  syntactically  apparent.  Conceptnally,  the  dy¬ 
namic  semantics  of  a  langnage  shonld  not  have  to  be  concerned  with  syntactically  apparent 
information.  Of  conrse,  the  program’s  flow  of  control  mnst  be  part  of  the  program’s  semantics, 
or  else  the  semantics  wonld  not  adeqnately  model  the  program’s  execntion.  Bnt  for  expository 
pnrposes,  it  is  pleasing  to  factor  ont  information  that  is  a  trivial  property  of  the  syntax,  so 
that  the  rnles  of  the  semantics  themselves  snccinctly  captnre  exactly  the  dynamic  properties  of 
execntion. 

To  this  end,  we  will  need  a  helper  fnnction  next  to  manage  the  syntactically  apparent  control 
flow  in  a  program.  Given  the  control  point  (7  of  a  statement  in  program  S,  if  C  is  in  the  middle 
of  a  statement  list  then  next  merely  retnrns  the  control  point  of  the  next  statement  in  the  list. 
Note  that  if  C  points  to  a  conditional  or  a  while  loop  then  the  next  statement  is  not  necessarily 
the  next  control  point  in  the  execntion. 

next5(C'.'i)  =  C.{i-\-l)  if  S'[C'.(* -|- 1)]  defined 

If  C  points  to  a  statement  that  is  the  last  in  a  statement  list,  then  next5(C')  will  not  be  defined 
by  the  above  eqnation.  There  are  two  cases  for  when  this  might  happen.  The  first  case  is  that 
the  statement  to  which  C  points  is  the  last  statement  in  the  ontermost  statement  list  in  S 
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(i.e.,  the  statement  list  S  itself).  In  this  case  next  returns  the  empty  control  point  e  to  signal 
program  completion. 

next5(*)  =  e 

The  second  case  is  that  the  statement  to  which  C  points  is  not  in  the  outermost  statement 
list.  In  that  case,  we  want  next5(C')  to  return  the  control  point  of  the  next  statement  to  be 
executed,  which  in  this  language  is  simply  in  a  lexically-enclosed  statement  list  and  is  always 
syntactically  apparent.  There  are  two  cases.  If  C  points  to  the  last  statement  in  an  arm  of  a 
conditional  statement  s,  then  next5(C')  should  be  the  next  statement  after  s. 

nexts{C.j.i)  =  next5(C')  if  S'[C']  =  if  . . .  then  . . .  else  . . . 

The  second  case  is  that  C  points  to  the  last  statement  in  a  while  loop  s,  in  which  case  next5(C') 
should  be  s  itself,  as  the  loop  might  need  to  be  executed  again. 

next5(C'.'i)  =  C  if  S'[C']  =  while  . . .  do  . . . 


The  following  example  demonstrates  all  of  the  concepts  of  the  next  function. 

Example  22  Consider  the  MlNI-C  program  S  in  Example  21,  shown  below  on  the  right.  The 
complete  definition  of  next  is: 

next5(0)  =  1 
next5(l)  =  2 

next5(1.1.0)  =  1.1.1 

next5(l.l.l)  =  2 
nexts(1.2.0)  =  1.2.1 
next5(1.2.1)  =  2 
next5(2)  =  3 
next5(3)  =  4 
next5(3.0)  =  3.1 

next5(3.1)  =  3.2 

next5(3.2)  =  3 

next5(4)  =  5 
next5(4.1.0)  =  4.1.1 
next5(4.1.1)  =  5 
next5(4.2.0)  =  5 
next5(5)  =  e 


1 

{ 

if  n  <  0  then 

1.1.1 

n:=-(n) 

1.2.1 

else 

n:=n  *  2; 

2 

r:=l; 

3 

while  n  >  1  do 

3.1 

{ 

r  :=r  *  n; 

3.2 

n:=n  -  1 

4 

}; 

if  r  >  60  then 

4.1.1 

r  :=r  -  1; 

5 

X  :=  r 

} 

Note  from  this  example  that  for  every  statement  list,  there  is  an  element  in  the  domain  of  next 
ending  with  0  and  thus  not  a  real  control  point  in  the  program.  The  only  reason  for  this  is 
because  we  allow  empty  statement  lists  in  MlNl-C  programs.  So,  for  instance,  if  control  point 
C  points  to  a  conditional  expression,  then  next^  ((7.2.0)  always  returns  the  next  statement  in 
an  execution  that  takes  the  else  arm.  So,  next5(1.2.0)  =  1.2.1,  which  is  the  first  (and  only) 
statement  in  the  else  arm  of  the  first  conditional  in  the  example  program,  but  next^  (4.2.0)  =  5, 
because  the  else  arm  of  the  second  conditional  is  empty,  and  thus  execution  immediately 
proceeds  to  the  statement  after  the  conditional. 
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5.8.2  Transition  system  via  meta-rules 

Now  we  can  define  the  meta-rnles  that  define  the  transition  relation  i — )■  for  a  MlNl-C  program 
S.  There  are  five  kinds  of  transitions.  Each  statement  in  S  of  the  form 

if  e  then  S  else  S' 


indnces  a  two  families  of  transitions:  the  transitions  from  a  snccessfnl  test  of  e  into  S,  and  the 
transitions  from  a  failed  test  of  e  into  the  S'. 

S'[C']  =  if  e  then  . . .  else  ...  C  =  next5(C'.1.0)  e  her  true 

(C,a)  ^  iC',a) 

S'[C']  =  if  e  then  . . .  else  ...  C  =  next^ ((7.2.0)  e  her  false 

(C,a)  ^  iC',a) 


Each  statement  in  S  of  the  form 

while  e  do  S' 

indnces  two  families  of  transitions:  the  transitions  from  a  snccessfnl  test  of  e  into  S',  and  the 
transitions  from  a  failed  test  of  e  to  the  rest  of  the  code  after  the  loop. 

S'[(7]  =  while  e  do  . . .  C'  =  next5((7.0)  e  her  true 


(C,a) 

^  (C',a) 

S'[(7]  =  while  e  do  . . . 

C'  =  next5((7)  e  h^r  false 

(C,a) 

^  (C',a) 

Finally,  each  statement  in  S  of  the  form 

1  :=  e 

indnces  a  family  of  transitions  that  perform  the  assignment  in  the  store. 

,S[(7]  =  {1  :=  e)  C  = 

=  next5((7)  1  w  e\-(j  v 

(C,a)  ^ 

>  {C,  a[w  v]) 

Below  is  an  example  that  illnstrates  how  this  transition  system  models  program  exeention. 

Example  23  Recall  the  MlNI-C  program  in  Example  21.  The  transition  system  defines  the 
following  execution  from  a  state  at  the  beginning  of  the  program  and  with  an  initial  .store  in 
which  n  is  bound  to  —4  and  all  other  l-values  are  bound  to  undef.  (The  only  store  mappings 
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shown  are  the  mappings  to  non-undef  values.) 


(1 

1-^ 

- 

4} 

) 

(1.1. 

l,{n 

1-^ 

- 

4} 

) 

(2 

1-^ 

4} 

) 

(3 

1-^ 

4 

r  1-^ 

1}  ) 

(3.1 

1-^ 

4 

r  1-^ 

1}  ) 

(3.2 

1-^ 

4 

r  1-^ 

4}  ) 

(3 

1-^ 

3 

r  1-^ 

4}  ) 

(3.1 

1-^ 

3 

r  1-^ 

4}  ) 

(3.2 

1-^ 

3 

r  1-^ 

12}  ) 

(3 

1-^ 

2 

r  I-;- 

12}  ) 

(3.1 

1-^ 

2 

r  1-^ 

12}  ) 

(3.2 

1-^ 

2 

r  1-^ 

24}  ) 

(3 

1-^ 

1 

r  1-^ 

24}  ) 

(4 

1-^ 

1 

r  1-^ 

24}  ) 

(5 

1-^ 

1 

r  1-^ 

24}  ) 

(e 

1-^ 

1 

r  1-^ 

24,  X  24}) 

5.8.3  Transition  system  via  transfer  relations 

The  key  idea  of  using  transfer  relations  to  replace  meta-rules  is  that  a  single  transfer  relation 
can  capture  the  commonalities  inherent  in  each  family  of  transitions  defined  by  the  meta-rules. 
For  instance,  in  the  meta-rule  approach,  every  if  e  then  S  else  S'  statement  induces  an  infinite 
family  of  transitions,  one  for  each  store  a  in  which  e  evaluates  to  true,  from  that  statement 
into  S.  Each  transition  in  this  infinite  family  does  exactly  the  same  thing:  simply  test  that  e 
is  true  in  the  store  on  the  left-hand  side  of  the  transition  before  proceeding  to  S. 

This  inspires  the  idea  of  defining  a  transfer  relation  Aq^c'  for  every  pair  C,  C  G  Ctrl  Point  of 
control  points  in  a  MlNl-C  program  S.  Each  one  will  specify  exactly  the  transitions,  as  defined 
by  the  meta-rules  above,  from  C  to  C.  In  other  words, 

(C,  cr)  I — ^  iff  aAc,c'cr'. 

Of  course,  the  vast  majority  of  these  transfer  relations  will  be  the  empty  relation  0,  because 
single-step  transitions  between  most  pairs  of  control  points  are  impossible. 

The  semantics  of  a  MlNl-C  program  S  is  thus  defined  as  a  finite  table 

A  =  {Ac,c'  \C,C'  ^  Ctrl  Point} 

of  transfer  relations,  one  for  each  pair  of  control  points  in  S',  such  that  A(7,c'  describes  all  of  the 
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transitions  between  control  point  C  and  the  control  point  C .  This  table  is  defined  as  follows. 


Ac,c'  =  < 


ei 


el 


e: 


el 


S'[C']  =  (if  e  then  . . .  else  . . . 

)  and 

C  =  nexts(C.l.O) 

S'[C']  =  (if  e  then  . . .  else  . . . 

)  and 

C  =  nexts(C.2.0) 

S'[C']  =  (while  e  do  . . .) 

and 

C  =  nexts(C.O) 

S'[C']  =  (while  e  do  . . .) 

and 

C  =  next5(C') 

S[C]  =  {l:=e) 

and 

C'  =  next5(C') 

otherwise 


<1 

is  short  for 

e?  A|0 

and 

el  A 

is  short  for 

e?  0|  A 

empty  relation  and  •  is  the  identity  relation  (i.e.,  empty  parallel  assignment). 


where  0  is  the 


Example  24  The  semantics  defines  13  non-empty  transfer  relations  for  the  MlNI-C  program 
in  Example  21. 


Ai, (1.1.1) 

n  <  0?  • 

A(3.2),3 

A(i.i.i),2  = 

n  1-^  -(n) 

A3, 4 

Al, (1.2.1)  = 

n  <  0^  • 

A4, (4.1.1) 

A(i.2.1),2  = 

n  !-;•  n  *  2 

A(4.1.1),5 

II 

<1 

r  !-;•  1 

<1 

A3, (3.1)  - 

n  >  1?  • 

<1 

A(3.1),(3.2)  = 

r  1— )•  r  *  n 

Note  that  this  semantics  does  not  need  any  of  the  mechanism  developed  in  Chapter  3  for 
composing  and  manipnlating  transfer  relations.  Indeed,  the  transfer  relations  that  it  yields 
as  the  model  of  program  execntion  are  qnite  simple.  Bnt  the  intent  is  that  the  ontpnt  of  the 
semantics  is  merely  a  first  step  in  an  application  of  onr  program-analysis  methodology.  Once  the 
single-step  transfer  relations  of  a  MlNl-C  program  are  in  hand,  one  can  compose  these  transfer 
relations  to  yield  a  single  componnd  relation  that  expresses  the  behavior  of  any  finite  segment 
of  execntion.  The  following  example  illnstrates  that  composing  single-step  transfer  relations  is 
analogons  to  stringing  together  transitions  defined  by  the  meta-rnles  in  the  previons  section. 

Example  25  The  execution  shown  in  Example  23  has  control  path 

1,  (1.1.1),  2,  3,  (3.1) ,  (3.2),  3,  (3.1),  (3.2),  3,  (3.1),  (3.2),  3, 4,  5,  e 
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and  so  its  compound  transfer  relation  is 

^1,(1. 1.1);  ^(1.1.1), 2;  ^2,3;  •  •  • ;  A3^4;  A4^5; 


which  relates  the  input  store 
to  the  output  store 


{n  1-^  -4} 

{n  1-^  l,r  1-^  24,x  1-^  24}. 


Now,  one  can  nse  the  composition  algorithm  ©  of  Chapter  3  to  perform  effectively  these 
compositions,  thereby  facilitating  the  analysis  of  the  program.  Recall  the  convention  of  Chap¬ 
ter  4  of  writing  Ar  to  represent  the  transfer  relation  of  control  path  ,  ,  which  is  a  list  of  control 
points.  Recall  that 

Ar,c,r'  =  Ar,c  ®  Ac',r'- 

Example  26  A  transfer  relation  expressing  the  above  execution  is  computed  as 
Al, (1.1.1)  ©  A(4  4  4(^2  ®  A2,3  ®  •  •  •  ®  A3^4  ©  Ai^s  ©  As^e 

which,  if  the  symbolic  evaluation  P  for  primitive  operations  and  C  for  conditional  relations  are 
both  simply  the  identity  function,  is 


where 

64  =  -(n)  >  1 

62  =  (-(n)  -  1)  >  1 

63  =  ((-(n)  -  1)  -  1)  >  1 

64  =  6  >  1 

e  =  ((-(n)  -  1)  -  1)  -  1 

6'  =  ((1  *  -(n))  *  (-(n)  -  1))  *  ((-(n)  -  1)  -  1) 

Adding  some  logic  to  P  to  simplify  arithmetic  operations  could  (in  principle)  yield  a  result  as 
simple  as 

64  =  n  <  —1 

62  =  n  <  —2 

63  =  n  <  — 3 

64  =  n  <  —4 

6  =  -(n)  -  3 

6'  =  (-(n)  *  (-(n)  -  1))  *  (-(n)  -  2) 

Note  how  the  conditional  relation  expresses  the  control-flow  constraints  on  this  particular  control 
path.  Note  also  that  the  conjunction  of  the  first  five  conditions  in  the  above  transfer  relation  can 
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only  evaluate  to  true  when  n  is  —4.  A  C  algorithm  could  (in  principle)  determine  this  property 
automatically,  dispense  with  ei  through  64,  pass  this  value  of  n  onto  a  simple  constant-folding 
symbolic  evaluation  of  e  and  e' ,  and  achieve  the  extremely  simple  transfer  relation 


n,  r,  X  I— )■  1,  24,  24 


Such  sophisticated  symbolic  reasoning  abont  integer  arithmetic  and  conjoined  comparison  tests 
may  be  difficnlt  to  achieve  in  general.  For  the  most  part,  we  leave  this  topic  as  an  open  issne 
and  offer  no  general  algorithms.  However,  even  simple  symbolic  evalnations  can  go  far  whenever 
any  initial  bindings  are  known.  The  following  example  illnstrates  how  this  works. 


Example  27  Suppose  that  C  is  the  identity  function,  performing  no  simplification  of  condi¬ 
tional  expressions,  and  P  merely  performs  constant  folding.  Then 


n  1-^  -4 


^1,(1. 1.1)  ®  ^(1.1.1), 2  ®  ^2,3  ®  •  •  •  ®  A3^4 


®  A4^5  © 


is  the  transfer  relation 

n,  r,x  1-^  1,24,24 


So  far,  we  have  introdnced  only  a  single  example  MlNl-C  program.  This  example  nses  only 
integer  data,  and  in  particnlar  does  not  allocate  or  nse  records.  However,  mnch  of  the  sophisti¬ 
cation  of  onr  methodology  lies  in  its  treatment  of  heap-allocated  mntable  data  strnctnres.  The 
following  example  demonstrates  record  allocation. 

Example  28  Consider  the  MlNI-C  program 

1  while  a  <>  nil  do 

{ 

new  :=  {car  =  a. car,  cdr  =  b}; 
a  :=  a.cdr; 
b  :=new 

} 

that  constructs  a  reverse  of  list  a.  Let  ,  be  the  control  path  that  starts  at  control  point  1, 
progresses  through  one  iteration  of  the  loop,  and  ends  back  at  control  point  1.  Then, 


Ar 


a, b,new,  H,  a.cdr,  (H),  (H), H  +  1, 

(H). car,  (H). cdr  a.car,b.cdr 


represents  the  transfer  relation  of  one  iteration  and 


Ar;r 


(a  <>  nil)? 


(a.cdr  <>  nil)?  A 
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where 

A  = 

represents  the  transfer  relation  of  two  adjacent  iterations 

Note  how  an  assignment  relation  can  represent  mnltiple  record  allocations  in  parallel  via  the 
expressions  (H),  (H  +  1),  (H  +  2),  and  so  forth.  These  expressions  evalnate  to  seqnential  free 
pointers.  The  assignment  to  H  reflects  the  total  nnmber  of  pointers  allocated  by  the  execntion 
segment  that  the  transfer  relation  models. 

Also,  note  again  that  the  conditional  relations  encode  the  conditions  nnder  which  a  partic- 
nlar  control  path  is  taken. 

The  following  example  demonstrates  the  snbtlety  of  nnknown  initial  aliasing. 

Example  29  Consider  the  MlNI-C  program 

1  while  (a  <>  nil)  do 

{ 

temp  :=  a; 
a  :=  a.cdr; 
temp.cdr  :=b; 
b  :=  temp 

} 

that  destructively  appends  the  reverse  of  list  a  onto  list  h.  If ,  is  the  control  path  that  begins  at 
control  point  1,  progresses  through  one  iteration  of  the  loop,  and  ends  back  at  control  point  1, 
then 

Ar  = 

is  the  transfer  relation  of  one  loop  iteration,  and 
Ar;r  = 

where 

A  =  a,  b,  temp,  a.cdr  i-)-  if  (a  =  a.cdr,  b,  a.cdr.cdr),  a.cdr,  a.cdr,  a  | 

A'  =  a,  b,  temp,  a.cdr,  a.cdr.cdr  i-)-  if  (a  =  a.cdr,  b,  a.cdr.cdr),  a.cdr,  a.cdr,  b,  a 

is  the  transfer  relation  of  two  adjacent  loop  iterations.  If  C  were  defined  to  propagate  the  first 
test  o/  a  =  a.cdr  into  its  two  branches,  then  the  composition  algorithm  could  simplify  the  if 
expressions  and  achieve 

A  = 


(aOnil)?  (a.cdr  <>  nil)?  (a  =  a.cdr)?  A  |  A' 


(a  <>  nil)?  a,  b,  temp,  a.cdr  a.cdr,  a,  a.cdr,  b 


A' 
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This  example  is  worth  some  study.  It  expresses  that  the  net  effect  of  executing  two  adjacent 
iterations  of  the  loop  in  some  context  (store)  depends  on  whether  a  was  aliased  to  a.cdr  in 
that  context  (in  other  words,  if  a  is  a  circular  list  of  length  one).  If  not,  then  the  execution 
has  the  net  effect  of  A',  which  directly  expresses  the  expected  net  behavior  of  two  iterations 
of  a  reverse-append  routine.  On  the  other  hand,  if  a  is  initially  aliased  to  a.cdr,  then  some 
examination  of  A  reveals  that  the  net  effect  of  two  iterations  reduces  to  swapping  a  and  b. 

This  example  thus  demonstrates  that  the  effect  of  aliasing  on  data-structure  dereference 
and  destructive  assignment  is  rather  subtle  and  unpredictable,  but  the  composition  operation 
©  on  transfer  relations  reveals  this  subtlety.  Furthermore,  it  suggests  that  it  is  well  worth  the 
effort  to  design  the  C  algorithm  to  look  for  and  simplify  syntactically  redundant  conditional 
expressions.  In  this  dissertation,  we  do  not  describe  such  an  advanced  C  algorithm,  and  so  this 
is  left  for  future  work. 


5.9  Modeling  &  and  Pointer  Arithmetic 

The  only  reason  that  we  did  not  include  arrays  in  MlNI-C  was  for  simplicity.  In  Chapter  7 
we  will  show  how  to  model  arrays  in  a  functional  language  with  our  methodology,  and  it  is 
straightforward  to  extend  MlNl-C  in  the  same  manner.  In  this  section,  we  give  a  discussion  of 
how  to  add  some  of  the  features  of  C’s  pointers  that  are  not  present  in  MlNl-C.  For  generality, 
this  section  will  assume  that  MlNl-C  includes  arrays  as  described  in  Chapter  7. 

C  includes  the  following  expressions. 

&x  the  address  of  variable  x 
&(s.f)  the  address  of  field  f  of  struct  s 
&(a[i])  the  address  of  element  i  of  array  a 

Unlike  C,  MlNl-C  has  no  &  operator.  Related  to  this  is  our  choice  to  treat  pointers  as  records 
with  the  single  field  *. 

Alternatively,  we  could  have  treated  a  pointer  as  a  pair  value  (v,  v')  representing  the  1- 
value  v.v' .  In  this  way,  we  can  treat  the  latter  two  of  the  three  cases  above  via  the  syntactic 
translation 

&(e.e')  ^  (e,e') 

where  e[e']  is  represented  as  e.e',  as  we  describe  in  Chapter  7.  Then,  we  would  treat  an 
occurrence  of  *e  not  as  a  reference  (as  an  1-expression)  or  dereference  (as  an  expression)  of  the 
field  named  *  of  the  record  e,  but  rather  as  an  extraction  of  the  1- value  represented  by  the  pair 
e.  This  would  be  accomplished  by  the  following  syntactic  transformation. 

*e  ^  (7ri(e)).(7r2(e)) 

Recall  that  an  occurrence  of  e.e'  as  an  expression  (as  opposed  to  an  1-expression)  is  short  for 
deref  (e,  e'). 
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In  this  manner,  we  can  treat  most  of  the  fnnctionality  of  C’s  &  and  *  operators.  What  we 
cannot  do  is  to  take  the  address  of  a  variable,  and  there  is  a  good  reason  why  this  is  the  case. 
Intnitively,  &  coerces  an  1-valne  into  a  valne  (so  that  it  may  be  manipnlated  as  data  and  so 
forth),  and  *  coerces  a  valne  back  into  an  1-valne.  Above,  we  coerce  a  reference  1-valne  v.v'  as 
the  pair  valne  (v,v'),  which  may  then  be  coerced  back  into  the  1-valne  v.v'  (and  dereferenced, 
if  treated  as  an  expression).  We  conld  attempt  a  similar  approach  with  variables — for  instance, 
coercing  the  variable  x  into  the  token  ^x\  In  onr  cnrrent  formnlation,  the  only  1-expression 
that  evalnates  to  a:  is  a:  itself,  and  so  there  is  no  way  to  translate  an  arbitrary  expression  e 
into  an  1-expression  that  will  evalnate  to  a:  if  e  evalnates  to  ‘a:’.  Bnt  this  is  jnst  dne  to  onr 
particnlar  langnage  of  expressions  and  1-expressions  and  onr  choice  to  model  MlNl-C  variables 
with  1-expression  variables.  In  principle,  there  is  no  difficnlty  to  extend  the  notion  of  &  for 
variables. 

With  the  above  model  of  pointers,  it  is  possible  to  model  C’s  pointer  arithmetic  for  array 
indices.  A  pointer  to  an  array  element  is  (v,v'),  where  v  is  the  array  and  v'  is  the  index  of  the 
element  (as  we  explain  in  Chapter  7).  Snppose  the  expression 

e  t  e' 

represents  the  increment  of  pointer  e  by  e'  (which  wonld  be  written  as  e  +  e'  in  C).  We  wonld 
translate  this  into  the  MlNl-C  expression 


(7ri(e),7r2(e)  +  e). 


Chapter  6 


First-Class  Functions:  The  Language 

Pure 


In  Chapter  5  we  presented  the  imperative  while- loop  langnage  MlNl-C,  the  primary  pnrpose  of 
which  was  to  introdnce  the  methodology  of  defining  a  transition-system  semantics  of  a  program¬ 
ming  langnage  with  compnter-representable  transfer  relations  representing  the  single  steps,  and 
then  nsing  the  ©  algorithm  to  bnild  mnltiple  steps  corresponding  to  particnlar  control  paths 
in  the  program.  Bnt  MlNI-C  is  a  rather  simple  langnage,  and  so  in  this  chapter  we  consider 
more  advanced  langnage  featnres.  Onr  pnrpose  is  to  demonstrate  that  onr  methodology  of 
semantics-based  program  analysis  is  reasonably  general. 

The  only  control  constrncts  in  MlNI-C  are  conditionals  and  while  loops.  One  can  get  by 
withont  any  other  control  constrncts,  bnt  it  wonld  be  qnite  inconvenient  for  most  programming 
tasks.  Real  programming  langnages  have  some  mechanism  for  defining  fnnctions.  A  fnnction 
accepts  some  inpnt  data  (parameters)  from  its  caller  and  retnrns  a  resnlt  valne  to  the  caller. 
In  some  langnages,  snch  as  Haskell  [H"''92],  fnnctions  have  the  same  inpnt-ontpnt  behavior  in 
any  context.  This  is  sometimes  known  as  referential  transparency  [SS90].  We  call  this  kind 
of  fnnction  “pnre”.  The  vast  majority  of  programming  langnages,  however,  provide  impnre 
fnnctions.  In  this  chapter  we  model  a  programming  langnage  with  pnre  fnnctions,  and  in  the 
next  chapter  we  will  extend  this  langnage  with  imperative  featnres  and  impnre  fnnctions. 

In  some  langnages,  the  fnnctions  are  said  to  be  first  class.  This  means  that  the  fnnctions 
are  semantic  valnes,  and  as  snch  can  be  manipnlated  by  a  program  like  any  other  valne.  For 
instance,  they  may  be  assigned  to  variables,  placed  in  data  strnctnres,  and  passed  to  other 
fnnctions.  The  fnnctions  in  most  advanced  langnages,  snch  as  Scheme  [ReC86]  or  Standard  ML 
(SML)  [MTH90],  are  first  class.  In  contrast,  the  fnnctions  in  C  [KR78],  Fortran  [Knn71],  and 
Pascal  [Bar81]  are  not  first  class. 

In  some  ways,  onr  methodology  mnst  be  rather  stretched  to  handle  first-class  fnnctions.  We 
will  see  this  below  and  in  Chapter  7,  and  we  will  give  a  snmmary  at  the  end  of  Chapter  7. 
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6.1  Substitution  vs.  Closures 

Consider  the  syntax  of  A-calcnlns  terms  [Bar84]: 

e  ::=  x  I  \x.  e\  ee 


Now  consider  the  following  term: 

(Aa:.  Ay.  x)  (Xz.  z) 

Via  the  rednction  rnles  of  the  A-calcnlns,  this  term  rednces  in  a  single  step  to  the  (nniqne  np 
to  renaming)  normal  form: 

Ay.  Xz.  z 

This  term  represents  a  fnnction  that,  given  any  argnment,  yields  the  identity  fnnction. 

Mnch  of  programming- langnage  theory  and  practice  is  based  on  the  notion  that  rednc¬ 
tion  of  A-calcnlns  terms  is  a  kind  of  computation.  Even  fnrther,  the  Cnrry-Howard  isomor¬ 
phism  [How80]  introdnces  the  connection  between  proof  theory  and  compntation,  and  conse- 
qnently  between  logical  systems  and  programming  langnages.  (We  refer  the  cnrions  reader  to 
[GLT89].)  Let  ns  consider  how  the  above  A-calcnlns  term  might  correspond  to  a  SML  program 
(chosen  rather  arbitrarily,  simply  as  an  example  of  a  “real”  langnage),  and  how  its  rednction 
might  correspond  to  the  execntion  of  the  program.  The  SML  program 

(fn  X  =>  (fn  y  =>  x))  (fn  z  =>  z) 

corresponds  to  the  A-calcnlns  term  above;  indeed,  the  syntax  trees  of  the  two  terms  are  iso¬ 
morphic.  Now,  consider  the  execntion  of  this  SML  program.  Everyone  who  has  written  SML 
programs  imagines  that  the  execntion  of  this  program  will  proceed  something  like  this: 

1.  Evalnate  (fn  x  =>  (fn  y  =>  x)). 

(a)  Create  a  closnre  /  in  the  heap  for  (fn  x  =>  (fn  y  =>  x) ) . 

(b)  Retnrn  /  as  the  resnlt  of  evalnation. 

2.  Evalnate  (fn  z  =>  z). 

(a)  Create  a  closnre  g  in  the  heap  for  (fn  x  =>  (fn  y  =>  x) ) . 

(b)  Retnrn  g  as  the  resnlt  of  evalnation. 

3.  Apply  /  to  g. 

(a)  Bind  x  to  g. 

(b)  Create  a  closnre  h  in  the  heap  for  (fn  y  =>  x)  with  this  binding  of  x. 

(c)  Retnrn  g  as  the  resnlt  of  evalnation. 

4.  Retnrn  g  as  the  resnlt  of  evalnation. 
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There  seems  to  be  much  more  going  on  in  the  execution  of  the  SML  program  than  in  the 
single  step  reduction  from 


\x.  Ay.  X  (Xz.  z) 


to 

Ay.  Xz.  z. 

The  overarching  reason  is  that,  as  we  have  suggested,  reduction  of  A-terms  is  an  abstract  notion 
of  computation,  while  SML  programs  execute  on  real  computers.  The  salient  point  that  this 
example  illustrates  is  that  each  step  of  a  A-term  reduction  builds  a  whole  new  term,  but  it 
is  infeasible  to  build  literally  an  entire  SML  program  over  and  over  during  execution.  More 
specifically,  a  reduction  of  a  A-term  substitutes  the  argument  of  a  function  for  every  occurrence 
of  the  parameter  of  the  function  in  the  body  of  the  function.  In  the  above  example,  the  reduction 
substitutes  the  literal  term  Xz.  z  for  the  single  occurrence  of  x  in  Ay.  x  to  build  the  final  normal- 
form  term.  Theoretically,  an  implementation  of  a  real  programming  language  could  be  based  on 
a  similar  idea.  But  in  practice,  it  is  usually  more  efficient  to  build  closures  instead  of  performing 
substitution. 

The  difference  between  closures  and  substitution  lies  in  the  treatment  of  variables.  A  pro¬ 
grammer  is  accustomed  to  thinking  of  variables  as  identifiers  that  are  bound  to  values  when 
the  program  runs.  This  deeply  ingrained  notion  that  a  variable  “has  a  value”  is  partially  an 
artifact  of  this  implementation  issue,  and  is  supported  by  the  standard  denotational  model  of 
the  A-calculus  which  models  a  term  Xx.e  as  a  continuous  function  [Sto77].  In  the  reduction 
of  A-terms,  one  must  consider  variables  in  a  different  light;  they  are  placeholders  that  during 
reduction  (execution)  are  replaced  with  terms  and  disappear  entirely. 

Because  the  overarching  goal  of  program  analysis  is  to  determine  information  about  the 
run-time  behavior  of  programs,  one  must  begin  by  modeling  the  programming  language  in  a 
way  that  reflects  or  abstracts  this  run-time  behavior.  Therefore,  because  real  implementations 
typically  use  closures  to  model  first-class  functions,  we  will  model  functions  with  closures  in 
our  semantics. 


6.2  Syntax 


We  now  present  the  purely  functional  language  called  Pure.  A  program  is  a  member  of  the 
set  Term  of  terms,  written  in  a  brand  of  continuation-passing  style  [LD93]. 


t  ::=  let  x  =  eint 
I  rec  0  in  t 

I  e(e) 

I  if  e  then  t  else  t' 


local  binding 

recursive  function  binding 
function  application 
conditional 
simple  term  (Exp) 


::=  x{y)  =  t' 


9 


n-ary  function  definitions 
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A  simple  term  is  an  expression  as  defined  in  Chapter  2. 

e  ::=  x  variable  looknp 

I  p(ei, . . . ,  Eji)  primitive  application 

x,y,z  G  Var  variables 

Usnally,  we  nse  x  for  a  fnnction  name  or  let  binding,  y  for  a  fnnction  parameter,  and  2;  to  refer 
to  a  free  variable  of  a  term  t,  the  set  of  which  is  denoted  by  FV (t)  and  defined  in  the  nsnal 
fashion.  Expressions  that  appear  in  Pure  terms  may  nse  the  following  primitive  operations, 
which  are  members  of  Primop. 


=  c 

constants  (nnllary) 

+  -  * 

integer  operations  (binary) 

A 

V 

II 

A 

V 

boolean  operations  (binary) 

tuple 

ordered-tnple  constrnction  (n-ary) 

1  TTj 

ordered-tnple  component  selection  (nnary) 

Constants  are  the  integers  and  booleans,  as  in  MlNl-C. 

6.3  Discussion 

Notice  that 

is  not  a  valid  program  in  PURE.  For  instance,  consider  a  program  that  defines  and  then  calls  a 
cnrried  addition  fnnction. 


rec  f  (x)  =  (rec  g(y)  =  x  +  y  in  g) 
in  f(24)(42) 

This  is  not  a  term.  One  wonld  have  to  write  this  by  nsing  a  continuation  fnnction  as  an 
interface  between  the  application  f(l)  and  the  application  v{2)  where  v  is  the  resnlt  of  the 
former  application. 

rec  f(x,k)  =  (rec  g(y)  =  x  +  y  in  k(g)) 
in  (rec  k(v)  =  v(42)  in  f(24,  k)) 

We  are  moving  toward  a  form  of  continnation-passing  style  (CPS).  CPS  was  stndied  early 
as  the  snbset  of  A-calcnlns  terms  for  which  call-by-name  and  call-by-valne  rednction  strategies 
are  eqnivalent  [Plo75].  The  first  major  practical  nse  of  CPS  was  in  the  Scheme  Rabbit  com¬ 
piler  [Ste78],  which  translated  sonrce  programs  into  a  restricted  syntax  mnch  like  onrs.  This  was 
later  done  in  the  Orbit  Scheme  compiler  [Kra88]  and  then  in  the  SML/NJ  compiler  [App92]. 
All  translations  are  based  on  a  nniversal  calling  convention  in  which  all  sonrce  fnnctions  take 
a  continnation  argnment  and  all  sonrce  applications  mnst  thns  pass  a  continnation  fnnction 
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describing  the  remainder  of  the  compntation.  So,  for  instance,  an  antomatic  CPS  converter 
might  prodnce 

rec  f  (x,  k)  =  (rec  g(y,  k)  =  k(x  +  y)  in  k(g)) 
in  (rec  k(v)  =  v(42,top)  in  f(24,  k)) 

for  the  program  above. 

These  translations  are  well  stndied,  both  in  theory  and  practice  [Plo75]  (see  also  other 
references  cited  above),  and  so  we  will  not  go  any  fnrther  into  CPS  here.  Snffice  it  to  say  that 
onr  syntactic  restriction  does  not  limit  the  expressivity  of  the  langnage. 


6.4  Semantics 

In  Chapter  5  we  modeled  a  MlNl-C  program  by  a  transition  system  in  which  a  state  is  a 
pair  of  a  control  point,  representing  the  cnrrent  syntactic  position  of  execntion,  and  a  store, 
representing  the  state  of  the  memory/data.  We  gave  alternate  definitions  of  the  single-step 
transitions  indnced  by  a  program,  one  in  terms  of  meta-rnles  (the  standard  practice)  and  one 
in  terms  of  transfer  relations.  It  is  the  latter  formnlation  that  provides  a  basis  for  program 
analysis  with  onr  methodology.  In  this  section,  we  discnss  semantics  of  PURE  in  a  similar 
fashion. 


6.4.1  Control,  data,  and  execution  states 

Actnally,  both  the  notion  of  control  point  and  the  notion  of  store  are  simpler  in  PURE  than  in 
Mini-C. 

We  designed  a  whole  notation  for  the  control  points  of  a  MlNI-C  program,  bnt  it  tnrns  ont 
that  one  can  simply  nse  the  snbterms  of  a  PURE  program  to  fnnction  as  control  points.^  This 
is  intnitively  pleasing  becanse  the  control  point  itself  has  meaning:  if  an  execntion  is  at  control 
point  t,  it  means  that  the  rest  of  the  execntion  is  the  evalnation  of  t.  (This  works  becanse  of 
Pure’s  CPS-like  syntax.)  In  contrast,  a  seqnence  of  integers  that  fnnctions  as  a  control  point 
of  a  MlNI-C  program  has  no  meaning  alone;  it  is  only  an  index  into  the  text  of  the  program. 

As  for  stores,  becanse  there  are  no  assignable  data  strnctnres  in  Pure,  mnch  of  the  complex¬ 
ity  of  stores  is  not  needed  to  model  the  data.  Recall  that  a  store  is  a  map  from  1-valnes,  which 
are  either  variables  or  references  v.v' ,  to  valnes.  In  Pure  there  is  no  need  for  the  references, 
and  so  all  that  is  needed  is  a  map  from  variables  to  valnes.  This  is  the  familiar  notion  of  an 
environment: 

p  G  Env  =  Var  — )■  Val  environments 

Recall  that  there  were  five  different  kinds  of  valnes  (members  of  Val)  in  MlNl-C: 

•  constants  (members  of  Constant) 

^Actually,  in  Section  6.4.3  we  will  need  to  refer  to  a  function  definition  x{y)  =  t  as  a  control  point.  In  this 
case,  that  control  point  is  identified  with  the  control  point  t.  We  will  discuss  this  later. 
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•  field  names  (members  of  Field) 

•  pointers  to  assignable  data  strnctnres  (members  of  Pointer) 

•  immntable  tnples  (members  of  Val*) 

•  the  nndefined  valne  undef 

Again  becanse  there  are  no  assignable  data  strnctnres  in  Pure,  pointers  are  not  needed,  and 
neither  are  field  names.  However,  Pure  has  first-class  fnnctions,  and  so  there  mnst  be  valnes 
that  model  these  fnnctions.  As  we  explained  in  Section  6.1,  it  is  best  for  many  applications  of 
program  analysis  to  model  a  fnnction  as  a  closure.  A  closnre  has  two  parts: 

•  a  fnnction  g  (a  phrase  of  the  form  x(y)  =  t) 

•  an  environment,  providing  the  valnes  for  all  free  variables  of  g 

The  set  of  closnres  is  thns  defined  as  follows: 

{g,p)  G  Closure 

Finally,  the  set  of  valnes  is  given  by  the  following  eqnation: 

V  G  Val  =  Constant  +  Closure  +  Val*  +  {undef } 

Note  there  is  a  circnlarity  in  the  eqnation  for  Val,  and  there  is  another  circnlarity  in  the  eqnations 
for  Env,  Closure,  and  Val.  The  actnal  sets  are  defined  by  mntnal  indnction,  as  the  least  solntion 
to  these  three  eqnations. 

As  an  aside,  an  infamons  difficnlty  in  designing  analyses  of  langnages  with  either  first- 
class  fnnctions  or  data  strnctnres  lies  in  how  to  deal  with  these  circnlarities  in  an  analysis 
algorithm  that  is  gnaranteed  to  terminate.  The  circnlarity  in  the  eqnation  for  Val  arises  from 
immntable  tnples,  and  indeed  most  static  analyses  of  even  immntable  strnctnred  valnes — not 
to  mention  mntable  data  strnctnres — are  qnite  crnde  (e.g.,  [Wad87],  [Hei92]).  One  of  the 
few  satisfactory  analyses  of  strnctnred  valnes  is  [Den92],  bnt  it  is  still  somewhat  ad  hoc  and 
also  qnite  complicated.  The  other  circnlarity  arises  from  first-class  fnnctions,  and  again  it  is 
no  coincidence  that  analysis  designers  have  traditionally  enconntered  tronble  with  first-class 
fnnctions.  The  nsnal  ad  hoc  approaches  are  to  be  fonnd  in  the  work  on  denotational-based 
abstract  interpretations,  nsnally  applied  to  strictness  analysis  [BHA86],  the  work  on  finite 
approximations  of  closnres  [Shi91],  or  the  work  on  angmenting  higher-order  type  systems  with 
“effects”  [TJ92].  None  of  this  work  seems  satisfactory.  What  lies  at  the  root  of  these  problems 
is  the  nbiqnitons  analysis  methodology  that  begins  with  the  design  of  a  (hopefnlly  clever) 
approximation  of  an  infinite  domain.  In  contrast,  becanse  onr  methodology  is  centered  aronnd 
the  analysis  of  the  changes  to  a  store  (or  environment  in  the  case  of  Pure)  rather  than  the  store 
(or  environment)  itself,  complexities  (indnction,  recnrsion,  infinite  sets,  etc.)  in  the  strnctnre 
of  valnes  themselves  do  not  canse  any  a  priori  difficnlty;  rather,  the  focns  is  on  the  complexity 
of  the  transitions  between  states. 
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As  we  explained  above,  a  state  of  execution  comprises  the  state  of  control,  which  is  a  term, 
and  the  state  of  data,  which  is  an  environment. 

State  =  Term  x  Env 

What  remains  is  to  define  the  transitions  induced  by  a  Pure  term.  These  model  the  single 
steps  of  execution  of  the  term. 


6.4.2  Transitions  via  meta-rules 

Here  we  describe  how  each  kind  of  term  induces  transitions.  No  transition  is  possible  from  an 
expression  term;  if  an  execution  reaches  the  state 

then  the  execution  halts,  and  the  result  of  the  execution  is  the  value  v  such  that 

ehpV 

which  is  guaranteed  to  be  unique  because  Pure  is  deterministic. 

Each  of  the  four  other  kinds  of  terms  take  transitions. 


•  let  X  =  e  in  t.  In  this  case,  x  is  bound  to  a  value  to  which  e  evaluates  in  the  current 
environment,  and  execution  proceeds  to  t. 

_ e^pV _ 

(let  X  =  e  in  t,  p)  I — )■  (t,  p[x  i— )■  r;]) 

Note  that  an  environment  is  just  a  store  in  which  all  reference  1-values  (v.v')  are  bound 
to  undef ,  and  so  for  convenience  we  use  the  same  definition  of  h  for  the  evaluation  of  an 
expression  in  an  environment.  Similarly,  the  definition  of  p[x  i— )■  r;]  is  a  special  case  of 
store  extension,  defined  on  page  2.4.2. 


{p[x  ^v])y 


V  ii  X  =  y 
a  y  otherwise 


This  notion  of  variable  binding  may  seem  strange.  Why  isn’t  it  necessary  to  rename  the 
variable  x  in  order  to  avoid  conflicts  with  other  occurrences  of  a:  in  p  that  might  be  needed 
later  in  the  computation?  It  turns  out  that  these  other  occurrences  of  x  will  always  be 
captured  in  closures  and  so  will  not  interfere  with  the  update  of  env.  We  will  discuss  this 
further  in  Section  6.5. 


•  rec  g  in  t  where  g  =  (^(•••)  =  . . .)  In  this  case,  x  is  bound  to  a  closure  whose  func¬ 
tion  component  is  g  and  whose  environment  component  is  the  current  environment,  and 
execution  proceeds  to  t. 

_ 9  =  =  •••) _ 

(rec  g  in  t,p)  i — ^  (t ,  p[x  {g ,  p)]) 
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•  e(e).  For  a  transition  to  be  possible  from  this  term,  e  mnst  evalnate  to  some  closnre 
{x{y)  =  t,  p')  in  the  cnrrent  environment.  In  that  case,  the  new  environment  is  p'  extended 
with  the  following  additional  bindings:  a  binding  from  x  to  the  closnre  itself  and  a  binding 
from  the  variables  y  to  the  corresponding  valnes  to  which  the  expressions  e  evalnate  in 
the  cnrrent  environment.  Then  execntion  proceeds  to  t. 

ehpV  (e)i  hp  (v)i  V  =  (x(y)  =  t,  p') 

{e{e),p)  I — ^  {t,p'[x  ^  v][y  ^  v\) 

Note  that  p  is  simply  “thrown  away”  in  this  transition.  This  is  becanse  PURE  terms  are 
in  continnation-passing  style,  and  so  an  evalnation  retnrns  only  when  the  execntion  of  the 
entire  program  is  complete.  All  parts  of  p  that  will  be  needed  in  the  fntnre  compntation 
mnst  be  passed  throngh  via  the  argnments  e,  typically  in  the  closnre  of  a  continnation 
fnnction. 

It  may  at  first  seem  nnnecessary  to  have  closnres  in  the  first  place.  If  we  appropriately 
renamed  variables  dnring  execntion,  we  conld  ensnre  that  the  bindings  of  the  free  variables 
of  a  fnnction  g  are  never  overwritten  later  in  the  execntion.  In  this  way,  there  wonld  be 
no  need  to  save  and  restore  p']  instead,  p  may  simply  be  threaded  throngh  on  fnnction 
application.  We  discnss  this  choice  fnrther  in  Section  6.5. 

•  if  e  then  t  else  t'.  For  a  transition  to  be  possible  from  this  term,  e  mnst  evalnate  to 
either  true,  in  which  case  evalnation  proceeds  to  t,  or  false,  in  which  case  evalnation 
proceeds  to  t'. 

e  \-p  true 

(if  e  then  t  else  t',  p)  \ — )■  (t,  p) 
e  \-p  false 

(if  e  then  t  else  t',  p)  i — )■  (P,p) 

6.4.3  Transitions  via  transfer  relations 

Instead  of  nsing  meta-rnles  to  define  the  transitions,  it  is  possible  to  represent  them  directly  in 
a  compnter  as  transfer  relations.  The  methodology  here  is  exactly  the  same  as  for  Pure.  The 
transfer  relation 

represents  all  and  only  the  valid  single-step  transitions  from  term  t  to  term  t']  it  is  a  binary 
relation  between  the  environment  at  t  and  the  environment  at  t'. 


Functions  as  control  points 

In  Mini-C,  the  definition  of  the  single-step  transfer  relations  precisely  corresponded  to  the 
meta-rnles  that  defined  the  transitions.  However,  there  is  a  snbtle  issne  concerning  first-class 
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functions.  It  turns  out  that  to  define  the  single-step  transfer  relations  of  a  Pure  term,  we  need 
to  have  a  slightly  special  treatment  for  the  control  points  of  functions. 

In  particular,  we  need  to  use  a  phrase  x{y)  =  t  as  a  control  point.  In  this  case,  that  control 
point  is  identified  with  t,  but  in  a  rather  subtle  way.  For  instance,  consider  two  functions  with 
the  same  body  t2,  but  different  names  and  parameters: 

g  =  x{y)  =  t2 
g'  =  x'{if)  =  t2 


The  transfer  relation 

describes  the  single  steps  from  term  ti  into  the  function  g,  and  the  transfer  relation 

describes  the  single  steps  from  term  ti  into  the  function  g'.  Both  of  these  may  be  composed 
with  the  transfer  relation 

that  describes  both  the  first  step  of  function  g  and  the  first  step  of  function  g'. 

So,  in  other  words,  g  and  g'  are  both  identified  with  t2  for  the  purpose  of  relating  the 
transfer-relation  formulation  of  the  semantics  with  the  meta-rule  formulation,  and  thus  for 
composing  a  transfer  relation  that  ends  with  one  control  point  (in  this  case,  g  or  g')  with  a 
transfer  relation  that  begins  with  the  same  control  point  (in  which  case,  t2).  However,  one  may 
give  separate  definitions  for  both  and  g/. 


Primitive  operations  to  support  closures 

It  is  necessary  to  add  three  new  families  of  primitive  operations  to  Primop  in  order  to  build  and 
examine  closures.  All  of  these  operations  are  simple.^ 

•  There  is  an  n-ary  simple  primitive  operation  closure^  foi"  every  function  g  and 

variables  zi,...,Zn  that  creates  a  closure  whose  function  is  g  and  whose  environment 
binds  the  variables  zi, . . .  ,Zn.  It  is  defined  as  follows: 

closureg^(^^^  .^^„)('yi,...,'y„)  {g,p) 


where: 

pZi  =  Vi 

pz  =  'undef  li'il  <  i  <  n.  z  ^  Zi 

^Note  that  in  general  it  makes  sense  only  to  have  simple  primitive  operations  because  Pure  is  a  deterministic 
pure  language. 
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•  There  is  a  unary  simple  primitive  operation  codeg  for  every  function  g  that  tests  whether 
the  function  of  a  closure  is  equal  to  g.  It  is  defined  as  follows: 

codeg{{g,p))  ^  true 

codeg{v)  ^  false  otherwise 

•  There  is  a  unary  simple  primitive  operation  env^  for  every  variable  2;  that  returns  the 
value  of  the  variable  in  the  environment  of  a  closure.  It  is  defined  as  follows: 

en-Vz{{g,p))  ^  (pz) 

en-Vz{v)  ^  undef  ii  v  ^  Closure 


Free  variables  and  bisimulation 


For  technical  reasons  that  we  will  explain  below,  it  is  necessary  to  introduce  a  kind  of  equivalence 
relation  on  states. 

One  can  also  easily  show  that  the  only  variables  whose  values  might  be  needed  in  the 
execution  of  term  t  are  the  free  variables  of  t.  The  following  bisimulation  expresses  this  precisely. 

Definition  11  (Similar  values  and  states)  We  define  the  similar  relation  ~  on  values  and 
states  as  follows,  where  FV  (t)  denotes  the  free  variables  of  t,  and  similarly  for  FV (g) . 


Two  values  v  and  v'  are  said  to  be  similar  (written  v 
v'  =  {g,p'),  and 


v' )  if  either  v  =  v'  or  v  =  {g,  p), 


X  G  FV(g)  ^  (pa:)  ~  {p  x). 

•  Two  states  (t,p)  and  (t' ,  p')  are  said  to  be  similar  (written  (t,p)  ~  (t',p'))  if  t  =  t'  and 

X  G  FV(t)  ^  (pa:)  ~  {p  x) 


Proposition  1  (Bisimulation)  Let  1 — )■*  be  the  transitive  closure  of  1 — )■.  If  -ipi  ~  -ip'^  and 
'>p2  r\j  'ip2  then 


The  single-step  transfer  relations 

Now  we  can  define  a  single-step  transfer  relation  for  every  two  terms  t  and  t']  this  relation 
specifies  how  the  environment  changes  in  a  transition  from  t  to  t' .  We  also  define  At^g  for 
the  transitions  into  a  functions  g,  as  described  above.  In  MlNl-C,  these  transfer  relations  are 
indexed  by  control  points  instead  of  terms;  because  there  are  only  a  finite  number  of  control 
points  in  a  MlNI-C  program,  the  number  of  single-step  transfer  relations  for  a  MlNI-C  program 
is  also  finite.  The  situation  is  not  quite  analogous  for  Pure.  A  Pure  term  t  does  indeed  have 
only  a  finite  number  of  subterms.  However,  if  t  is  meant  to  be  executed  in  an  environment 
that  initially  contains  some  (non-rmdef)  values — which  would  typically  be  the  case  for  partial 
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programs,  or  in  other  words  for  terms  t  that  have  free  variables — then  some  of  those  valnes 
might  be  closnres  that  contain  terms  not  in  t. 

In  Mini-C  the  single-step  transfer  relations  precisely  mirrored  the  semantic  meta-rnles. 
This  is  almost  the  case  here,  bnt  there  is  some  difference  dne  to  fnnctions.  Above,  we  gave  the 
meta-rnles  for  each  of  the  fonr  kinds  of  PURE  terms.  Here,  we  do  the  same  for  the  single-step 
transfer  relations,  all  of  which  are  0  nnless  defined  otherwise  below. 

•  let  X  =  e  int.  This  case  is  jnst  like  the  meta-rnle;  there  is  a  single-step  transfer  relation 
from  this  term  to  t  that  describes  the  new  binding  to  the  environment: 

^(let  x—e  in  t),t  I  ^  ^~l 

•  rec  g  in  t.  This  case  is  very  mnch  like  the  meta-rnle;  there  is  a  single-step  transfer  relation 
from  this  term  to  t  that  describes  the  binding  of  the  new  closnre.  Bnt  there  is  a  snbtle 
difference.  The  meta-rnle  bnilds  a  closnre  that  contains  the  entire  cnrrent  environment, 
while  the  transfer  relation  bnilds  an  environment  that  keeps  the  bindings  only  of  the  free 
variables  of  the  fnnction  g.  The  bisimnlation  proposition  above  jnstifies  this  change.  This 
relation  nses  a  primitive  operation  to  create  this  closnre: 

^(rec  g  in  t),t 

where  {zi, . . . ,  Zn}  =  FY{g)  (ordered  arbitrarily). 

This  difference  between  the  meta-rnle  and  the  transfer  relation  is  not  conceptnally  deep. 
We  conld  very  well  have  defined  the  meta-rnle  to  restrict  the  environment  of  the  closnre 
to  the  free  variables  of  g,  as  well,  bnt  that  choice  is  nnnecessarily  cnmbersome,  not  to 
mention  non-standard.  On  the  other  hand,  there  are  two  reasons  why  we  define  the 
transfer  relation  as  we  do. 

—  We  have  less  flexibility  in  the  design  of  the  transfer  relation.  Becanse  environments 
are  not  members  of  Val  (in  which  case  we  might  have  imagined  a  nnllary  context- 
sensitive  primitive  operation  that  evalnates  in  p  to  p  itself),  it  is  necessary  to  bnild 
the  environment  explicitly,  as  we  do  with  closure^  Therefore,  we  mnst 

know  the  set  of  variables  to  be  bonnd,  and  it  is  both  more  convenient  and  more 
flexible  to  examine  locally  g  to  see  what  variables  it  might  need  than  to  examine 
the  lexical  context  of  g  within  the  larger  program  to  see  what  variables  are  merely 
allowed  to  be  free  in  g. 

—  Ti'ansfer  relations  are  actnal  compnter-representable  strnctnres,  and  so  for  practical 
reasons  these  strnctnres  shonld  be  as  small  as  possible.  Restricting  the  environment 
to  the  free  variables  of  p  is  a  simple  way  to  rednce  potentially  the  textnal  size  of  the 
closnre. 

•  e(e).  This  case  is  qnite  different  from  the  meta-rnle.  Execntion  from  this  term  will 
transition  to  the  fnnction  g  of  the  closnre  to  which  e  evalnates.  Thns,  the  control  part  of 
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the  state  after  the  transition  not  only  depends  on  the  environment  part  of  the  state  before 
the  transition,  as  is  the  case  with  conditionals,  bnt  is  actnally  taken  from  the  environment. 
This  relationship  is  no  problem  for  the  meta-rnle  formnlation  of  a  transition  system, 
becanse  it  is  jnst  another  example  of  how  one  state  in  an  execntion  depends  in  some 
fashion  on  the  previons  in  the  execntion.  Bnt  snch  transitions  are  difficnlt  to  express  as 
a  single-step  transfer  relation  becanse  the  transfer  relation  itself  is  already  parameterized 
over  the  two  control  points,  in  this  case  terms.  In  other  words,  when  defining 
specifying  all  transitions  from  a  state  at  t  into  the  fnnction  g  (recall  that  the  control 
point  g  is  identified  with  g's  body),  one  cannot  express  how  the  control  point  g  depends 
on  the  environment  at  the  beginning  of  the  transition,  becanse  g  is  fixed. 

In  all  transitions  in  MlNI-C  and  all  other  transitions  in  PURE,  the  only  dependency  of  the 
control  part  of  the  latter  state  on  the  store  part  of  the  former  state  (or  environment  part, 
in  the  case  of  Pure)  is  for  conditionals,  in  which  the  store  (or  environment)  in  the  former 
state  determines  which  one  of  two  possible  control  points  is  in  the  latter  state.  In  contrast, 
fnnction  application  is  fnndamentally  more  difficnlt.  The  reason  is  that  the  control  point 
(or  term,  in  Pure)  to  which  execntion  proceeds  is  part  of  the  store  (or  environment) 
itself.  Thns,  given  an  application  term  t  =  e(e),  one  cannot  extract  the  fnnction  g  from  e 
as  in  the  meta-rnle;  rather,  one  mnst  define  At^g  to  implement  the  appropriate  condition 
that  e  will  indeed  evalnate  to  a  closnre  whose  fnnction  is  g. 

This  snggests  a  definition  of  the  form 

for  some  A.  The  choice  of  A  brings  np  the  second  difficnlty  with  fnnctions,  and  this  time 
not  limited  only  to  first-class  fnnctions.  Namely,  the  transition  from  fnnction  application 
to  fnnction  body  is  the  only  time  in  which  the  environment  is  changed  wholesale.  This, 
too,  is  somewhat  at  odds  with  onr  particnlar  langnage  of  transfer  relations.  We  provided 
parallel  assignment  in  the  langnage  of  transfer  relations  to  express  a  store  modification, 
bnt  not  to  replace  an  entire  store  with  a  new  one.  It  is  not  as  bad  as  it  seems,  however, 
becanse  Pure  nses  environments,  which  contain  only  variable  bindings.  Fnrthermore, 
one  can  easily  show  that  the  only  variables  bonnd  (to  a  non-undef  valne)  when  execntion 
is  at  a  term  t  are  the  variables  in  the  lexical  scope  of  t,  a  well-known  concept  that  we  do 
not  define  formally  here. 

Therefore,  one  solntion  wonld  be  to  define  a  new  nnllary  primitive  operation  undef  that 
evalnates  to  valne  undef  and  then  define  A  to  bind  all  variables  in  the  scope  of  e(e)  to 
undef  and  all  variables  in  the  scope  of  g  to  their  appropriate  valne,  with  the  latter  taking 
precedence  over  the  former  for  any  variables  in  both  scopes. 

However,  we  choose  a  different  solntion,  largely  for  practical  reasons.  The  bisimnlation 
above  tells  ns  that  in  any  transition  to  g  it  is  snfficient  to  ensnre  only  that  all  free  variables 
of  g  are  bonnd  correctly.  The  resnlting  execntion  may  not  be  identical  to  the  one  given  by 
the  meta-rnles,  bnt  will  be  eqnivalent  modnlo  the  bisimnlation  relation  ~.  It  is  also  easy 
to  see  that  this  does  not  affect  the  final  resnlt  of  the  program.  So  we  have  the  following 


codeg(e)?  A 
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definition. 

= 

if  g  =  x{y)  =  t,  {zi,  ...,Zn}  =  FV(t),  and 

{e  if  Zi  =  X 

(e)j  ifzi  =  (y)j 

envz-  (e)  otherwise 

Thns,  the  assignment  relation,  instead  of  replacing  the  environment  wholesale,  as  done 
in  the  meta-rnle,  simply  ensnres  that  all  of  the  free  variables  of  the  fnnction  are  bonnd 
correctly. 


codeg(e)? 


^1, 


,Zn  H-  ei. 


•  if  e  then  t  else  t' .  This  case  is  jnst  like  the  meta-rnles;  there  are  two  single-step  transfer 
relations,  one  from  this  term  to  t  filtering  the  true  condition,  and  the  other  from  this 
term  to  t'  filtering  the  false  condition: 


A 


(if  e  then  t  else 


^(if  e  then  t  else 


ei 


<1 

is  an  abbreviation  for 

e?  A|0 

,  and 

el  A 

is  an  abbreviation  for 


;?  0|A 


Symbolic  evaluation  of  code^  and  env^ 


Whenever  one  adds  a  new  primitive  operation  to  Primop,  one  needs  to  define  its  symbolic 
evalnation.  Almost  all  primitives  are  context-independent,  and  it  is  safe  to  nse  the  identity 
fnnction  for  their  symbolic  evalnation.  This  is  the  case  with  closureg^^,  code^,  and  env^,  bnt  in 
the  case  of  the  latter  two  it  is  important  to  perform  some  simple  bnt  very  nsefnl  simplifications. 
We  define  their  symbolic  evalnations  as  follows. 


env2,(closureg^(^j^...^^„)(ei, . . .  ,e„))  =  e* 


codeg(closureg/^^(e)) 


true  if  g  =  g' 
false  otherwise 


This  is  similar  to  the  symbolic  evaluation  of  tt*  that  selects  the  ith  component  of  a  tuple. 

■  ■  ■  1  ^n))  — 


We  will  see  why  these  simplifications  are  important  in  the  following  example. 
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A  small  example 


Consider  the  Pure  program 


rec  g  in  f  (1) 


where 

g  =  (f(x)  =  X  +  y). 

By  the  meta-rule  formulation  of  the  transition-system  semantics,  the  execution  of  this  program 
in  an  environment  in  which  y  is  bound  to  n  proceeds  as  follows. 


(rec  g  in  f  (l),{y  !-)■  n}  ) 

I — ^  (f(l)  ,{f  ^  {g,{y  ^n}),j  ^n}  ) 

I — ^(x  +  y  ,{f^{g,{j^n}),x^l,j^n}) 

There  are  two  transitions  in  this  execution.  The  first  one  is  described  by  the  transfer  relation 


^(rec  g  in  f(l)),(f(l)) 


f  closureg^(y)(y) 


and  the  second  one  is  described  by  the  transfer  relation 


A 


code^ff )?  X  !-)■  1 


The  composition  of  the  two  transitions  is  described  by  the  transfer  relation 


^(rec  g  in  f(l)),(f(l)),g  ^(rec  g  in  f  (l)),(f  (1))  ®^(f(l)),g 

If  the  symbolic  evaluations  of  code^  and  env^  performed  no  simplifications,  then  the  ©  would 
return 


codeg(closureg^(y)(y))? 


f,x,y  closureg^(y)(y),  l,enVy(closureg^(y)(y)) 


as  this  composition,  which  is  correct  but  extremely  cumbersome.  However,  with  the  symbolic 
evaluations  we  defined  above,  ©  returns 


f,x,y  closureg^(y)(y),  l,y 


which  exploits  the  fact  that  the  called  function  is  known  in  order  to  both  eliminate  the  dynamic 
condition  on  the  control  flow  and  to  propagate  statically  the  value  of  y  through  the  closure. 

A  subtle  point  that  is  unrelated  to  these  symbolic  simplifications  concerns  the  final  binding 
of  f .  Note  that: 

•  In  the  execution  trace  of  3  states  shown  above,  the  final  environment  contains  a  binding 
for  f . 
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•  The  transfer  relation  shown  immediately  above  describes  that  the  net  effect  of  this  length- 
3  control  path  inclndes  an  assignment  to  f . 

•  The  valne  bonnd  to  f  is  the  same  in  both  cases. 


This  may  seem  exactly  as  expected.  After  all,  we  did  describe  the  meta-rnles  and  the  single-step 
transfer  relations  as  alternate  formnlations  of  the  same  transition-system  semantics.  Bnt  as  it 
tnrns  ont,  the  fact  that  the  net  effect  of  both  formnlations  on  f  are  eqnivalent  is  an  accident  in 
this  case.  The  explanation  lies  in  the  bisimnlation  relation  we  defined  earlier.  For  this  case: 

•  In  the  meta-rnle  formnlation,  the  second  transition  does  a  whole-scale  replacement  of  the 
caller’s  environment  with  the  closnre  of  the  callee  and  then  extends  this  environment  with 
both  f  and  x,  representing  the  passing  of  those  two  valnes  to  the  callee. 

•  In  the  transfer-relation  formnlation,  the  second  transition  binds  the  free  variables  of  the 
fnnction  body,  which  is  the  set  {x,y},  bnt  does  not  remove  the  binding  of  f  that  was 
present  in  the  caller’s  environment. 


In  this  case,  the  two  bindings  of  f  happen  to  be  the  same,  bnt  this  will  not  generally  be  the 
case.  However,  the  bisimnlation  relation  tells  ns  that  in  a  state  at  term  t  we  may  simply  “filter 
ont”  all  bindings  of  variables  not  in  FV(t),  and  then  the  correspondence  between  the  meta-rnle 
formnlation  and  the  transfer-relation  formnlation  will  be  exact. 

In  this  case,  we  conld  thns  view  the  transition  trace  as 

(rec  0  in  f  (l),{v  !-)■  n|  ) 

^  (f(l)  ,{f  ^  {£/,{y  ^«})}) 

I — ^  (x  +  y  ,{x  l,y  n}  ) 

and  the  composed  transfer  relation  as 

x,y  ^  l,y  • 


As  a  final  note,  we  make  a  note  abont  the  final  state  of  execntion.  In  general,  the  final  state 
of  an  execntion  is  a  state 

ie,p) 

and  the  resnlting  valne  of  the  execntion  is  a  valne  v  snch  that 


e  \-p  V. 

In  the  transfer-relation  formnlation,  we  may  nse  E  to  obtain  an  expression  that  represents  the 
valne  of  a  term  in  terms  of  the  free  variables  of  the  term.  In  the  example  above,  this  corresponds 
to 


E(x  +  y) 


f,x,y  closureg^(y)(y),  l,y 


1  +  y- 


which  ret  nr  ns 
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6.5  Variable  Renaming  vs.  Closures 


The  semantics  that  we  have  given  for  Pure  does  not  involve  variable  renaming.  For  instance, 
the  term 

let  X  =  1  in  X  +  z 


and  the  term 


let  y  =  1  in  y  +  z 


are  distinguished  apart  in  PURE,  although  they  differ  merely  by  the  choice  of  variable  name.  In 
fact,  these  two  distinct  terms  induce  two  distinct  families  of  transitions.  The  first  term  induces 
the  family  of  transitions 


(let  X  =  1  in  X  +  z,  /?)  i — )■  (x  +  z,  /?[x  i-)-  1]) 
ranging  over  environments  p,  while  the  second  term  induces  the  family  of  transitions 

(lety=  1  iny  +  z,p)  i — ^  (y  +  z,  p[y  1]). 

But  this  may  seem  strange.  If  p  already  has  a  binding  for  the  variable  in  question  (x  for 
the  first  case  and  y  for  the  second  case)  then  what  assurance  do  we  have  that  that  binding  is 
no  longer  needed  and  may  be  discarded  by  the  environment  update?  One  may  expect  instead 
a  meta-rule  for  let-binding  transitions  that  looks  something  like 

e\-p  V  px'  =  undef 
(let  x  =  eint,p)  I — )■  {t[x' /x],  p[x' v]) 

where  t[x' /x]  substitutes  the  variable  x'  for  all  free  occurrences  of  the  variable  x.  Note  that  the 
rule  does  not  have  the  syntactic  non-interference  condition  x'  0  FV (t)  because  it  is  covered  by 
the  semantic  non-interference  condition  that  px'  =  undef.  The  notion  of  variable  renaming  is 
based  on  a-conversion  of  the  A-calculus  [Bar84]. 

We  can  get  away  without  variable  renaming,  however.  First  we  describe  how  we  achieve  this 
and  compare  this  choice  with  a  semantics  based  upon  variable  renaming,  and  then  we  explain 
why  it  is  desirable  for  our  purposes  of  program  analysis  to  avoid  the  need  for  variable  renaming. 

In  this  section,  we  will  need  a  notion  of  how  to  examine  a  transition  system  to  determine 
that  it  is  reasonable.  We  will  start  by  defining  a  notion  of  well  formed  states,  and  then  we 
apply  the  following  test  to  the  transition  system. 

Definition  12  (Preservation  of  well-formedness)  Given  a  notion  of  well-formedness  on 
states,  a  transition  system  is  said  to  preserve  well-formedness  if,  for  all  well  formed  states  ip, 
ip  I — )■  ip'  implies  that  ip'  is  well  formed. 

Our  semantics  for  PURE  uses  the  following  notion  of  well  formed  states. 

Definition  13  (Well  formed  states  (#1))  A  state  {t,p)  is  well  formed  iff  p  contains  the 
correct  bindings  for  all  free  variables  of  t  (in  other  words,  all  x  G  FV(t);. 
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When  we  say  that  a  binding  is  “correct”  we  mean  that  it  is  the  binding  that  one  wonld  intnitively 
expect  from  an  execntion  of  the  program  in  qnestion. 

Now,  it  is  easy  to  see  that  the  semantics  that  we  have  given  for  Pure  preserves  well- 
formedness.  For  instance,  consider  the  rnle  for  let-binding  transitions. 

_ e'rpV _ 

(let  X  =  e  ixit,  p)  I — )■  (t,  p[x  I— )■  v]) 

Note  that  x  0  FV(let  x  =  e  in  t)  and  FV(t)  C  (FV(let  a:  =  e  in  t)  U  {a;}).  Therefore,  if  p 
contains  the  correct  bindings  for  each  y  G  FV(let  x  =  eint),  then  p[x  i-)-  r;]  will  contain 

•  the  correct  bindings  for  each  y  G  FV  (let  x  =  eint)  and 

•  the  correct  binding  for  a:, 

and  thns  will  contain  the  correct  bindings  for  each  y  G  F V  (t) . 

The  rnle  for  rec  g  in  t  is  analogous.  It  is  easy  to  see  that  the  rnle  for  function  application 
works. 

e'rpV  (e)j  hp  {v)i  v  =  {x{y)  =  t,  p') 

(e(e),p)  I — ^  {t,p'[x  ^  v][y  ^ 

Note  that  p  is  discarded  in  the  transition.  The  well-formedness  of  the  state  (e(e),p)  thus 
merely  ensures  that  function  e  and  the  arguments  e  evaluate  to  correct  values.  To  show  that 
the  state  (t,p'[x  i-)-  r;][y  i-)-  v\)  is  well  formed,  we  must  reason  that,  because  p'  came  from  a 
previous  well  formed  state  (rec  x{y)  =  t  in  t',p'),  that  p'  contains  the  correct  bindings  for  all 
G  FV(rec  x{y)  =  t  in  t')  and  thus  all  G  FV(a:(y)  =  t).  Therefore,  p'[x  i-)-  r;]  [y  i-)-  -F]  contains 
the  correct  bindings  for  FV(t). 

This  discussion  explains  the  purpose  of  closures,  which  save  the  bindings  in  p'  that  must  be 
restored  upon  function  application. 

Alternatively,  we  could  dispense  with  closures  altogether,  representing  a  function  at  run 
time  as  the  function  term  g  instead  of  the  closure  {g,p).  This  leads  to  a  more  complex  notion 
of  well  formed  states. 

Definition  14  (Well  formed  states  (#2))  A  state  {t,p)  is  well  formed  iff  both 

•  p  contains  the  correct  bindings  for  all  x  G  FV(t),  and 

•  p  contains  the  correct  bindings  for  all  x  G  FV(y)  such  that  py  =  g  for  some  variable  y. 

Intuitively,  we  “flatten  out”  all  the  closure  environments  into  a  single  global  environment  that 
is  threaded  through  the  execution.  To  accomplish  this,  we  will  need  to  create  fresh  variables 
on  the  fly,  which  means  that  we  will  need  to  rename  variables  at  run  time. 
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For  instance,  the  old  rnle  for  let-binding 

_ e'rpV _ 

(let  X  =  e  int,  p)  I — )■  (t,  p[x  i— )■  v]) 

no  longer  works.  As,  we  explained  above,  x  0  FV(let  x  =  eint)  and  thns  x  may  be  safely 
overwritten  nnder  the  first  notion  of  well  formed  states.  Bnt  with  this  second,  more  restricted 
notion  of  well  formed  states,  we  cannot  be  snre  that  x  is  not  a  free  variable  of  some  fnnction  g 
in  the  range  of  p.  Thns,  we  mnst  rename  a:  to  a  fresh  variable.  We  showed  the  resnlting  rnle 
earlier  in  this  section: 

e\-p  V  px'  =  undef 
(let  x  =  eint,p)  I — )■  {t[x' /x],  p[x' v]) 

It  is  rather  easy  to  see  that  this  rnle  preserves  well-formedness  nnder  the  second  notion  becanse 
it  never  destroys  any  valnes  in  p. 

Bnt  now  we  do  not  need  closnres,  and  the  rnle  for  fnnction  application  threads  the  cnrrent 
environment  p  throngh,  again  renaming  the  newly  bonnd  variables  to  avoid  clashes  with  vari¬ 
ables  already  bonnd  in  p.  For  simplicity  of  illnstration,  we  show  the  case  for  single-argnment 
fnnctions: 

e\-p  V  e'  \-p  v'  V  =  (x(y)  =  t)  px'  =  undef  py'  =  undef 
{e{e'),p)  I — ^  {t[x'/x][y'/y],p[x'  ^  v][y'  ^  -y']) 

Once  again,  it  is  easy  to  see  that  this  rnle  preserves  well-formedness,  becanse  once  again  we 
rename  the  bonnd  variables  appropriately  snch  that  p  is  extended  rather  than  npdated. 

We  have  jnst  presented  (most  of)  a  different  style  of  transition-system  semantics  for  Pure 
in  which  we  have  traded  closnres  for  dynamic  variable  renaming.  The  resnlting  semantics  is 
argnably  cleaner  and  more  natnral,  bnt  we  have  only  considered  the  meta-rnle  formnlation  of 
the  semantics.  To  see  the  fnndamental  difficnlty,  consider  what  the  single-step  transfer  relation 

^(let  x—e  in  t),t 

shonld  be.  Withont  variable  renaming  it  is 


X  ^  e 


bnt  with  variable  renaming  is  mnst  be  something  like 


X  =  undef?  \  x  ^  e 


to  perform  the  dynamic  test  that  x  does  not  need  to  be  renamed. 
Bnt  then  the  composed  two-step  transfer  relation 


A 


(let  a:=l  in  let  x=2  in  t),(let  x=2  in  t),t 


will  be  the  empty  relation 
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instead  of  the  expected 

I  a:  !-;•  2' 

because  the  variable  x  was  not  renamed  along  the  given  length-three  control  path. 

So,  in  summary,  the  semantics  based  on  variable  renaming  has  the  following  ramifications 
on  the  transfer-relation  form  of  the  semantics: 

•  Whenever  an  analysis  composes  the  transfer  relation  for  a  control  path  ,  ,  it  is  the  re¬ 
sponsibility  of  the  analysis  to  rename  (statically)  the  variables  in  the  terms  along  ,  in  a 
way  that  is  guaranteed  to  capture  all  possible  behaviors  of  any  dynamic  renaming  of  the 
terms  in  the  meta-rule  semantics.  For  instance,  in  the  above  length-three  control  path, 
the  second  occurrence  of  x  must  be  renamed  to  a  variable  that  is  not  in  FV(t). 

•  A  number  of  tests  of  the  form  x  =  undef  will  accumulate  during  composition  of  a  control 
path  ,  ,  complicating  the  presentation  of  the  transfer  relation.  These  tests  are  necessary 
for  any  fixed  control  path  ,  because  the  transfer-relation  terms  in  TR  have  no  facility  to  be 
dynamically  renamed,  and  hence  each  term  in  TR  is  defined  only  on  initial  environments 
for  which  the  given  choice  of  variables  is  already  satisfactory.  But  the  accumulation  of 
these  tests  is  a  practical  disadvantage. 

It  is  because  of  these  factors  that  we  choose  a  semantics  that  does  not  have  variable  renaming 
and  thus  needs  closures  to  store  multiple  dynamic  occurrences  of  the  same  static  variable. 

It  is  possible,  however,  that  there  is  a  different  semantic  approach,  based  on  a  different 
treatment  of  Pure  variables  in  the  transfer  relations,  that  would  not  suffer  the  above  factors. 
For  instance,  perhaps  the  environment  could  be  represented  as  a  list,  accessed  by  de  Bruijn 
indices  [dB72]  instead  of  variables. 


Chapter  7 


Extending  Pure  with  Mutable 
Records  and  Arrays 


Imperative  features  are  crucial  components  of  almost  all  languages,  even  “functional”  languages 
such  as  Scheme  and  Standard  ML.  In  this  chapter  we  extend  Pure  with  both  assignable  arrays 
and  records  with  assignable  fields.  We  call  the  resulting  language  Impure. 


7.1  Syntax 


We  first  add  some  new  terms  to  Pure. 


let  a:  =  {/i  =  ei, . . . ,/, 
e.f  :=  e']  t 

e-f 

let  array  x  in  t 

e[e’]  :=  e”;  t 


f  G  Field 


=  Sn}  in  t  record  creation 

record  field  assignment 
record  dereference 
array  creation 
array  update 
array  dereference 

field  names 


The  array-creation  term  creates  an  array  whose  elements  are  initially  the  undefined  value  undef . 
For  simplicity,  arrays  do  not  have  bounds  and  are  conceptually  infinite. 


7.2  Discussion 

Records  in  Impure  are  not  like  records  in  SML;  the  former  are  mutable,  but  the  latter  are 
immutable.  It  would  be  simple  to  add  SML  records,  because  they  are  just  a  variant  on  tuples. 
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Like  Pure  tuples,  there  would  be  a  special  kind  of  value  to  model  immutable  records,  supported 
by  context-independent  primitive  operations. 

In  Section  4.5,  we  gave  a  discussion  of  various  kinds  of  errors,  and  the  way  in  which  they 
would  be  handled  with  our  semantic  methodology.  We  now  return  to  some  of  these  points. 

It  may  seem  strange  that  arrays  are  infinite,  and  that  its  elements  are  initially  undef .  In 
contrast,  the  array  creation  function  in  SML  takes  both  a  size  argument  and  an  argument  pro¬ 
viding  the  value  to  which  all  elements  are  initialized.  It  is  not  difficult  to  add  a  size  component 
to  our  arrays,  which  could  be  checked  at  run  time  for  out-of-bounds  errors.  But  an  initialization 
value  would  require  a  model  for  arrays  that  includes  an  explicit  default  value.  This  would  render 
the  dereference  and  assignment  operations  nonuniform,  and  thus  their  syntactic  occurrences  in 
transfer  relations  would  be  cumbersome.  Therefore,  we  require  that  programmers  write  their 
own  initialization  routine. 

This  language  is  rather  primitive  in  that  there  is  no  static  typechecking  on  records  and 
arrays  do  not  have  bounds.  For  instance,  the  Impure  program 

let  X  =  {car  =  1,  cdr  =  2} 
in  a:.bad  :=  3;  t 

first  creates  a  two-field  record  and  then  adds  a  third  field.  Also,  the  program 

let  X  =  {car  =  1,  cdr  =  2} 
in  let  y  =  a:.bad  in  t 

actually  binds  y  to  the  undefined  value  rmdef  and  proceeds  to  execute  t.  The  behavior  of  arrays 
is  similar.  For  instance,  the  IMPURE  program 

letarray  x 

in  let  y  =  x[3] 
in  a:[200]  :=  55;  a:[200] 

binds  y  to  undef  (dereferencing  an  uninitialized  array  element),  assigns  55  to  the  array  element 
200,  and  then  successfully  dereferences  the  element,  returning  55  as  the  result  of  the  program. 

Of  course  no  reasonable  language  would  function  in  this  manner.  Augmenting  this  language 
with  a  static  type  system  similar  to  SML  would  reject  programs  that  referenced  incorrect  field 
names.  But  the  situation  with  arrays  is  more  serious,  as  proper  handling  of  both  uninitial¬ 
ized  elements  and  bounds  checking  must  be  relegated  to  run-time,  and  thus  to  the  dynamic 
semantics.  Our  decision  to  simplify  the  situation  by  using  infinite  arrays  is  a  compromise  aimed 
to  simplify  the  transfer  relations  that  we  will  develop  to  model  the  dynamic  semantics  of  the 
language.  A  full  language  would  have  a  mechanism  for  exception  handlers,  to  which  control 
would  flow  in  the  case  of  array-bounds  errors. 


7.3  Syntax  Simplification 


As  with  Mini-C,  it  will  be  more  convenient  to  define  the  semantics  of  these  new  features  if  we 
rewrite  the  syntax  to  conform  more  closely  to  our  language  of  transfer  relations. 
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We  would  like  to  consider  the  terms  e.f  and  e[e']  as  expressions.  To  do  this,  we  need  to  add 
the  following  primitive  operations  to  Primop. 

•  All  field  names  /  G  Field  as  constant  nullary  operations. 

•  The  context-dependent  binary  operation  deref . 

Now  we  rewrite  the  term  e.f  as  the  expression  deref  (e,  /)  and  the  term  e[e']  as  the  expression 
deref  (e,  e'). 

Similarly,  we  consider  the  e.f  on  the  left-hand  side  of  a  record-field  assignment  to  be  an 
1-expression  and  rewrite  the  e[e']  on  the  left-hand  side  of  an  array  assignment  as  the  1-expression 
e.e' . 

Our  next  transformation  is  to  “compile”  a  record-allocation  term  in  exactly  the  same  way 
as  we  did  for  MlNl-C.  To  review,  we  need  the  following: 

•  for  each  natural  number  m,  a  pointer  value  (m)  G  Val 

•  the  unary  primitive  operation  ptr  that  casts  an  integer  m  to  the  pointer  (m)  (formally, 
ptr(m)  ^  {nT')),  where  we  write  (e)  for  ptr(e) 

•  a  distinguished  value  o  G  Val,  to  be  used  only  in  the  reference  o.o  (written  as  O)  which 
is  initialized  to  1  at  the  beginning  of  execution  and  always  holds  the  integer  of  the  next 
free  pointer  on  the  heap 

We  now  perform  the  following  transformation  of  a  PURE  program  t.  We  first  transform  the 
program  to 

let  O  =  1  in  t 

and  then  rewrite  every  subterm 

let  a:  =  {/i  =  ei,  ...,/„  =  e„}  in  t 


in  t  as  the  term 

let  X  =  (O) 

in  x.fi  :=  ei;  x.fn  :=  e„;  O  :=  O  +  1]  t 
Next  we  “compile”  arrays  in  a  similar  fashion.  We  rewrite  every  subterm 

letarray  x  int 


as  the  term 

let  X  =  {<>)  in  O  :=  O  +  1;  t. 

Note  that  this  treatment  of  allocation  does  not  equate  as  many  programs  as  one  might 
reasonably  expect.  For  instance. 


let  X  =  {car  =  3,  cdr  =  4}  in  y 
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and 

let  X  =  {car  =  1,  cdr  =  2}  in  let  y  =  {car  =  3,  cdr  =  4}  in  y 

have  different  meanings  becanse  they  nse  different  pointer  valnes  to  constrnct  the  record  bonnd 
to  y.  This  is  a  simple  case  of  the  well  stndied  problem  with  fnll  abstraction  for  langnages  that 
combine  assignment  and  procednres  [OT95,  Sie94]. 

At  this  point,  we  have  simplified  the  new  imperative  constrncts  into  a  set  of  primitive 
operations  and  a  generic  assignment  term.  Here  is  the  final  extension  of  Pure  syntax  for 
transformed  IMPURE  programs. 


p  ::=  ... 

I  / 

I  deref 

I  P'tr 

7.4  Semantics 

The  semantics  of  Pure  reqnired  only  an  environment  mapping  variables  to  valnes.  Bnt  like 
MlNl-C,  the  mntable  data  strnctnres  reqnire  the  expressiveness  of  stores.  There  are  thns  two 
steps  to  define  the  semantics  of  these  new  imperative  featnres: 

•  Recast  the  semantics  of  Pure  in  terms  of  stores  rather  than  environments,  in  order  to 
snpport  the  mntable  data  strnctnres  of  IMPURE. 

•  Give  the  semantics  of  the  assignment  term  e.e'  :=  e"]  t. 

It  is  fairly  straightforward  to  recast  the  semantics  of  PURE  to  nse  stores  instead  of  envi¬ 
ronments.  As  we  described  in  the  design  of  Pure,  an  environment  is  jnst  a  restricted  form  of 
store.  Indeed,  the  isomorphism 

Store  ~  Env  x  Heap 

where 

Heap  =  Val  x  Val  ^  Val 

makes  this  recasting  convenient.  The  “heap”  handles  the  bindings  of  references.^  Below  are 
the  meta-rnles  of  Pure  rewritten  snch  that  a  state  pairs  a  term  with  a  store  rather  than  with 
an  environment: 

State  =  Term  x  Store 

^Note  that  in  the  literature,  a  heap  is  often  called  a  store.  But  we  have  already  used  the  term  “store”  for 
something  else. 


t  assignment 

field  name  (nnllary) 
dereference  (binary) 
allocation  (nnary) 
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In  the  following  meta-rnles,  we  freely  switch  notation  between  the  isomorphic  forms  a  G  Store 
and  (/?,  7)  G  Env  x  Heap.  The  only  rnles  that  are  not  essentially  identical  to  the  corresponding 
rnle  of  PURE  are  those  for  fnnction  creation  and  application.  The  interesting  part  abont  those 
rnles  is  that  only  the  environment  component  of  the  store  is  saved  in  the  closnre  and  restored 
npon  application;  the  “heap”  component  is  instead  threaded  throngh. 

_ e\-aV _ 

(let  X  =  e  int,a)  1 — )■  (t,  a[x  1— )■  v]) 

_ O'  -  jp,  7) _ 

(rec  x{y)  =  t  in  t' ,  a)  1 — ^  {t,  (7[x  1-^  {x{y)  =  t' ,  p)]) 

e\-aV  {e)i  her  {v)i  V  =  {x{y)  =  t,  p')  a  ~  (p,  7) 

(e(e),cr)  I — ^  {t,{p'[x  ^  v][y  ^  v\,'y)) 

e  \-fj  true 

(if  e  then  t  else  t',  cr)  1 — )■  {t,a) 
e  ho-  false 

(if  e  then  t  else  t',a)  I — ^  {t',a) 


All  that  remains  is  to  give  the  rnle  for  the  generic  assignment  term,  which  is  qnite  straight¬ 
forward: 

e  ho-  e'  ho-  v'  e"  ho-  v" 

(e.e'  :=  e";  t,  cr)  1 — >  (t,crlv.v'  1— )■  r;"]) 

Again,  we  wonld  like  to  give  a  formnlation  of  this  transition  system  in  terms  of  single-step 
transfer  relations  instead  of  meta-rnles.  Becanse  we  designed  transfer  relations  to  be  relations 
on  stores  and  not  simply  environments,  the  single-step  transfer  relations  for  PURE  work  withont 
change  for  IMPURE.  Bnt  recall  that  those  definitions  made  nse  of  the  bisimnlation  relation  ~ 
on  both  valnes  and  stores.  So,  two  tasks  remain: 

•  Extend  the  bisimnlation  relation  to  states  with  stores. 

•  Define  the  single-step  transfer  relations  for  the  generic  assignment  term. 

The  first  task  is  straightforward;  two  heaps  are  similar  if  all  corresponding  nodes  (valnes)  are 
similar.  This  is  given  by  the  following  definition. 

Definition  15  (Similar  states  with  general  stores)  Two  states  (t,  (p,  7))  and  {t' ,  {p' ,'y')) 
are  said  to  be  similar  (written  (t,  (p,  7))  ~  (t',  (p',j')))  if  {t,p)  ~  (^^/?0  j(v.v')  ~  j'(v.v') 

for  all  values  v,v'  G  Val. 
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The  second  task  is  also  straightforward,  as  it  is  very  similar  to  the  assignment  statement  in 
Mini-C. 

^(e.e':=e"-,  t),t 

Note  that  the  transfer  relations  conceptnally  treat  both  let  binding  and  assignment  as  special 
cases  of  assignment  of  1-expressions. 


e.e'  1-^  e'‘ 


7.5  Final  Words  on  First-Class  Functions 

We  have  seen  several  ways  in  which  onr  model  of  a  langnage  with  first-class  fnnctions  is  not  qnite 
natnral.  For  one,  the  transfer  relations  that  save  an  environment  in  a  closnre  and  restore  the 
environment  npon  fnnction  application  process  each  free  variable  separately.  This  approach  is 
an  artifact  of  the  natnral  choice  to  model  langnage  variables  with  store  variables.  Alternatively, 
one  conld  model  an  environment  as  a  record  whose  field  names  are  the  langnage  variables,  and 
maintain  only  a  single  store  variable  E  that  is  always  bonnd  to  the  environment.  This  reqnires 
an  extra  level  of  indirection,  throngh  E,  for  variable  operations.  Bnt  for  that  price,  one  gains 
the  ability  to  save  and  restore  environments  in  closnres  easily,  becanse  they  are  simply  records. 

Related  to  this  issne  is  the  1-valne  O  that  points  to  the  index  of  the  next  free  heap  location, 
nsed  for  the  creation  of  new  records  and  arrays.  In  MlNl-C,  we  nsed  the  variable  H  for  this 
pnrpose,  bnt  we  cannot  nse  a  variable  in  a  langnage  with  first-class  fnnctions  becanse  in  that 
case  it  conld  be  saved  in  a  closnre  and  restored  on  fnnction  application.  Instead,  we  need  a 
global  assignable  variable.  To  avoid  the  need  to  treat  a  distingnished  variable  differently  from 
all  others,  we  instead  chose  to  nse  a  reference  O  =  o.o.  Again,  this  problem  wonld  have  a  nicer 
solntion  if  we  modeled  environments  as  we  snggested  in  the  preceding  paragraph.  Then  there 
wonld  be  only  two  variable  bindings  in  the  store:  E,  bonnd  to  the  cnrrent  environment,  and  H, 
bonnd  to  the  index  of  the  next  free  heap  location. 


Part  IV 

Analysis  Applications 


Chapter  8 


Multi-step  Program  Analysis 


In  Chapter  1  we  isolated  a  methodological  difficulty  with  program  analyses:  they  apply  an 
abstraction  between  every  execution  step  of  the  analyzed  program.  We  explained  that  this 
severely  cripples  the  quality  of  an  analysis  on  source  programs  for  which  a  desired  property  is 
temporarily  weakened  during  a  period  of  a  few  program  steps.  As  an  example,  we  gave  a  simple 
analysis  of  the  signs  of  integer-valued  variables,  but  we  also  explained  that  this  is  a  problem 
for  other  kinds  of  analyses,  such  as  shape  analyses. 

For  instance,  consider  the  following  IMPURE  program  that  destructively  reverses  a  binary 
tree. 

1  rec  reverse(x,  k)  = 

2  if  leaf(x) 

3  then  k() 

4  else  rec  ki()  = 

5  rec  k2()  = 

6  let  temp  =  x.l 

7  in  x.l  :=  x.r; 

8  x.r  :=  temp; 

9  k() 

A  in  reverse(x.r,  k2) 

B  in  reverse(x.l,  ki) 

C  in  reverse(x,  k) 

One  would  like  a  shape  analysis  to  determine  that  when  reverse  is  called  with  a  data  structure 
that  actually  is  a  binary  tree,  that  the  data  structure  is  still  a  tree  on  termination  of  the 
procedure.  Most  shape  analyses  cannot  determine  this  information.  To  our  knowledge,  only 
[SRW96]  can  achieve  this  result,  but  it  is  highly  specialized  for  this  and  similar  cases  and 
requires  quite  restrictive  conditions,  as  explained  in  Section  1.1. 

But  for  now  we  wish  to  point  out  why  this  program  is  so  difficult  to  analyze.  Consider  the 
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following  informal  description  of  what  happens  every  time  execntion  reaches  term  6. 

6  :  X  is  a  tree  with  snbtrees  L  and  R 

7  :  X  is  a  tree  with  snbtrees  L  and  R 

8  :  X  is  not  a  tree:  its  left  and  right  links  both  point  to  snbtree  R 

9  :  X  is  a  tree  with  snbtrees  R  and  L 

Program  analyses  infer,  or  abstract,  a  property  at  every  step,  and  so  it  is  difficnlt  to  cope  with 
the  states  at  term  8.  An  analysis  wonld  need  to  have  the  ability  to  describe  the  special  property 
at  8  with  snfficient  detail  to  infer  that  the  assignment  at  8  changes  x  back  to  a  tree.  In  fact, 
that  is  what  [SRW96]  does  to  solve  this  particnlar  problem. 

Bnt  there  is  a  mnch  more  general  solntion,  and  that  is  to  avoid  the  necessity  to  infer  a 
property  at  every  step,  and  instead  allow  mnltiple  steps  of  execntion  before  abstracting.  In 
order  to  explain  why  this  is  not  already  a  part  of  program-analysis  methodology,  we  mnst  take 
a  step  back  and  examine  the  fonndations  of  semantics-based  program  analysis. 


8.1  A  Review  of  Abstract  Interpretation 

Abstract  interpretation  [CC77]  is  a  general  framework  for  expressing  semantics-based  program 
analyses.  In  fact  it  is  more  than  that;  it  is  a  general  framework  for  relating  different  semantics 
of  a  langnage,  some  of  which  may  be  effectively  compntable  for  all  programs  and  therefore  in 
general  approximate,  or  inadequate,  as  a  semantic  definition  of  the  langnage.  Snch  compntable 
“semantics”  are  program  analyses,  and  with  abstract  interpretation  they  are  always  related  to 
some  adeqnate  semantics  of  the  langnage. 

A  semantics  of  a  langnage  is  a  fnnction 

M.  £  Prog  — )•  SemObj 

mapping  program  texts  to  semantic  objects.  The  main  observation  of  abstract  interpretation 
is  that  Af[P]  is  nsnally  defined  as  a  fixed  point,  and  the  potential  that  its  iterative  definition 
may  be  transfinite  directly  reflects  the  potential  that  P  may  not  terminate.  In  other  words, 

M[Pl=fix(5[Pl) 


where 

S  £  Prog  — >•  SemObj  — >■  SemObj 

and  SemObj  is  eqnipped  with  a  partial  order  and  fix  compntes  some  fixed  point  of  its  parameter — 
nsnally  the  least,  bnt  sometimes  the  greatest,  depending  on  the  particnlar  semantics. 

Abstract  interpretation  explains  how  to  relate  snch  a  semantics  to  a  more  abstract  semantics. 
One  first  designs  a  partial  order  SemObj  of  abstract  semantic  objects  and  then  defines  the 
fnnction 

OL  G  SemObj  — )■  SemObj 


8.2  Abstract  Interpretation  of  Transition  Systems 
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called  the  abstraction  function  that  projects  a  semantic  object  onto  the  abstract  semantic 
domain.  If  a  is  additive,  then  one  can  induce  a  unique  corresponding  concretization  function 

7  E  SemObj  — >•  SemObj 


defined  as 

7y  =  UsemObj{a:  E  SemObj  |  (ax) 

that  “coerces”  an  abstract  semantic  object  into  the  more  concrete  semantic  domain.  Then  from 
[CC79],  the  function 

S  E  Prog  — >•  SemObj  — >■  SemObj 

defined  as 

5[Pl  =  ao5[Plo7 

corresponds  to  S  in  such  a  way  that  the  fixed  point  fix  ((S[P])  is  an  abstraction  of  the  semantics 
Af[P].  In  other  words,  a(Af[P])  implies  the  property  fix((S[P]).  We  omit  the  formal  details 
of  this  correspondence  and  refer  the  reader  to  [CC79].  Intuitively,  an  abstract  semantic  object 
is  like  a  semantic  object,  but  with  some  information  missing,  and  SfP}  first  applies  (S[P]  to 
the  information  still  present  and  then  abstracts  the  result.  We  give  an  example  below. 

Sometimes,  the  information  missing  from  abstract  semantic  objects  is  not  necessary  to 
model  the  language.  For  instance,  much  of  the  study  of  pure  semantics  is  concerned  with 
finding  semantic  objects  that  are  as  abstract  as  possible  while  still  adequate  as  a  semantic 
definition;  the  ultimate  goal  here  is  full  abstraction  [Mul87].  But  for  the  purpose  of  program 
analysis,  it  is  necessary  to  abstract  away  crucial  information  for  the  sake  of  computability.  The 
choice  of  what  to  abstract  away  defines  the  program  analysis. 

A  central  intuition  is  that  the  function  (S[P]  typically  corresponds  in  some  sense  to  a  “step” 
of  an  execution  of  P.  We  cannot  formalize  this  correspondence  because  that  would  require  a 
semantic  definition  of  “execution  step”  in  the  first  place,  resulting  in  a  meaningless  circular 
definition.  Nevertheless,  this  intuition  is  an  invaluable  aid  in  visualizing  a  semantic  definition. 
In  fact,  the  word  “interpretation”  in  abstract  interpretation  comes  from  this  intuition,  because 
one  can  view  the  repeated  applications  of  (S[P]  in  its  iterative  fixed-point  computation  as  the 
steps  of  an  interpreter,  or  an  “abstract  interpreter”  in  the  case  of  (S[P]. 


8.2  Abstract  Interpretation  of  Transition  Systems 


The  preceding  discussion  does  does  not  specify  or  even  impose  any  serious  limitations  on  the 
semantic  objects.  Because  the  seminal  work  on  abstract  interpretation  ([CC77])  uses  a  rather 
simple  transition-system  semantics  for  expository  purposes,  abstract  interpretation  is  often 
misunderstood  to  be  limited  to  flowchart-based  semantics  of  while-loop  languages.  However, 
appearing  soon  after  that  seminal  paper,  Patrick  Cousot’s  thesis  ([Cou78])  showed  the  full 
maturity  of  the  framework. 
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But  in  Chapter  4  we  argued  that  a  transition  system  is  indeed  particularly  useful  as  a 
basis  for  program  analysis,  despite  much  work  elsewhere.  Our  methodology  is  designed  around 
transition-system  semantics,  and  so  we  would  like  to  examine  abstract  interpretations  based  on 
transition  systems. 

As  explained  in  Chapter  4,  a  state  of  a  transition  system  is  a  pair  of  a  control  point  and  a 
store: 

State  =  Ctrl  Point  x  Store 

The  transition  relation  defines  the  single  execution  steps  of  a  particular  program  P  as  pairs  of 
states: 

I — ^  C  (State  X  State) 

As  introduced  in  Chapter  1,  the  transition-system  semantics  of  a  language  is  a  function 

M.  £  Prog  — ^  P(State*) 

that,  given  a  program  P,  returns  a  set  of  finite  execution  prefixes,  represented  as  state  sequences, 
defined  inductively  by  unfolding  the  transition  relation  from  a  base  set  of  initial  states  (length- 
one  sequences): 

-if.-ip  £  Af[P]  'Ip  I — ^  'Ip' 

'ip. 'Ip. 'Ip'  £  Af[P] 

Above,  we  claimed  that  Af[P]  should  be  expressible  as  a  fixed  point  fix(5[P]).  Here, 

5[P]  £  P(State*)  ^  P(State*) 

is  the  function 

5[P]  ^  ^  U  {'ip .'Ip .'Ip'  I  'ip.'ip  £  $  A  ■)/)  I — )■  -Ip'} 

defined  by  the  above  rule  to  perform  one  inductive  application  of  the  rule.  Then  the  semantics 
Af[P]  of  program  P  is  the  least  fixed  point  of  5[P|  above  a  set  To  of  initial  states,  and  is 
precisely  the  set  of  (unbounded)  finite  prefixes  of  executions  starting  at  Tq. 

For  the  purposes  of  abstract  interpretation,  the  set  SemObj  of  semantic  objects  is  the  set 
P(State*)  of  sets  of  finite  execution  sequences,  ordered  by  inclusion.  An  abstract  interpretation 
must  provide  a  partial  order  SemObj  of  abstract  semantic  objects  and  an  abstraction  function 

a  £  P(State*)  — )•  SemObj. 

The  rest  of  the  abstract  interpretation  is  mechanical.  Suppose  T  is  the  least  fixed  point  of 

(a  o  5[P]  o  7)  £  SemObj  — )■  SemObj 

above  an  initial  abstract  semantic  object  T  £  SemObj  such  that  To  C  (7T).  Then  as  we 
explained  in  the  previous  section,  T  abstracts  Af[P|.  In  other  words,  a(Af[P])  implies  the 
property  T,  or,  equivalently,  Af[P]  C  (7T). 
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8.3  Invariant  Properties 

Most  program  analyses  compute  invariant  properties,  or  properties  of  the  states  that  occur 
during  program  execution.  In  this  case,  it  is  convenient  to  perform  the  above  abstraction  in 
two  steps.  The  first  step  abstracts  a  set  of  execution  sequences  by  the  set  of  states  appearing 
in  the  sequences.  In  other  words,  the  abstract  semantic  object  is  P(State),  and  the  abstraction 
function 

a  G  P(State*)  — )■  P(State) 

is  defined  as 

a =  {-ip  \  B-ip  E  ^ .  "ip  appears  in  -ip}. 

The  concretization  function 

7  G  P(State)  — ^  P(State*) 

is  induced  from  a  as  described  above.  Pushing  this  through  abstract  interpretation  defines  a 
function 

S  G  Prog  — ^  P(State)  — ^  P(State) 

defined  as 

SIP}  ^  =  {ip'  \ip  Alp  ^  Ip'} 

whose  least  fixed  point  Af  [P]  G  P(State)  above  a  set  H'o  of  initial  states  is  precisely  the  set  of 
states  reached  during  an  execution  from  an  initial  state  in  H'o. 

An  invariant  property  is  thus  a  superset  of  Af[P],  which  is  given  by  an  abstract  interpre¬ 
tation. 


8.4  An  Example 

As  an  example,  we  consider  the  example  in  Chapter  1  of  the  analysis  of  the  signs  of  integer 
variables.  In  this  example,  an  execution  state  comprises  a  control  point,  specifying  the  syntactic 
point  of  execution,  and  an  environment. 

'tp  G  State  =  CtrlPoint  x  Env 

The  step  function  5[P]  of  a  program  P  maps  a  set  of  states  to  their  successors  as  given  in  the 
previous  section.  The  semantics  Af  [P]  is  the  least  fixed  point  of  5[P]  above  {(Co,  po)};  it  is  the 
set  of  states  reachable  during  an  execution  from  the  initial  control  point  Co  and  environment 
po- 

Following  the  example  in  Section  1.1,  we  define  an  abstract  semantic  object  as  a  table  of 
sign  environments  indexed  by  control  point. 

^  G  State  =  CtrlPoint  — )■  Var  — )■  Sign 
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Here,  Sign  is  the  complete  lattice  given  in  Section  1.1.  The  fnnction  a  G  P(State)  — )■  State 
abstracts  a  set  of  states  by  choosing  for  each  control  point  C  and  variable  x  the  strongest  sign 
property  satisfied  by  all  bindings  of  x  in  environments  at  C. 

a'^  C X  =  l\{h  G  Sign  |  {C,  p)  ^ ^  {px)  =  n  ^  n  ^  n) 


Here  is  an  example  of  an  abstraction  of  a  set  of  three  states,  two  of  which  have  the  same  control 
point. 


a 


/{(C*!  Jx,y  2,3]),  \ 

(Cl,  [x,y  -1,4]), 

V  (<^2,  [x,y  0,0])}  y 


Cl,  [x,y  H-  int,pos], 

C2  [x,  y  I— )■  zero,  zero] 


As  we  explained  above,  the  concretization  fnnction  7  G  State  — )■  P (State)  in  indnced  from  a  as 


7  4'  =  {■)/)  I  a  {ip}  E 


where  C  is  pointwise  inclnsion  (in  other  words,  pointwise  property  implication),  bnt  for  illns- 
tration  we  give  the  alternate  definition  that  intnitively  expands  the  sign  properties  into  the 
integers  that  they  represent: 

7  4'  =  {(C,  p)  I  [px)  =  n  ^  C  a:)} 


Another  way  of  thinking  abont  7  is  that  it  specifies  the  states  that  are  consistent  with  the  given 
sign  properties. 

Next,  5[P]  is  defined  mechanically: 

5[P1  =  «o5[P1o7 

In  other  words,  (S[P],  given  an  abstract  semantic  object  IE',  first  applies  7,  yielding  all  the  states 
consistent  with  the  sign  properties  in  tk,  then  applies  the  transition  relation  1 — )■  to  these  states, 
yielding  their  snccessors,  and  finally  applies  a  to  these  snccessor  states,  abstracting  them  by  a 
semantic  object  describing  their  sign  properties. 

Given  an  initial  abstract  semantic  object  tko  snch  that  (Co,po)  £  (T^ko),  the  least  fixed 
point  of  5[P]  above  (fo  gives  sign  properties  that  hold  dnring  the  execntion  of  P.  For  example, 
if  P  is  the  while-loop  program  presented  earlier,  the  resnlt  of  the  analysis  is  the  the  table  of 
five  sign  environments  shown  in  Section  1.1  next  to  their  respective  program  points,  with  the 
last  one  corresponding  to  the  “exit”  program  point. 

It  is  worth  considering  again  the  analogy  given  in  Section  1.1  of  compnting  the  ronnded  snm 
of  a  list  of  numbers.  In  this  analogy,  a  semantic  object  11/  G  P(State)  corresponds  to  a  precise 
real  nnmber,  and  its  abstraction  (aH/)  G  State  corresponds  to  the  ronnding  of  that  nnmber. 
There  is  no  eqnivalent  of  7  becanse  a  ronnded  real  nnmber  is  still  a  real  nnmber,  bnt  in  general 
we  need  7  to  “coerce”  a  member  of  State  back  into  a  member  of  V (State).  Then  applying  5[P], 
to  take  one  step  of  program  execntion  and  then  abstract,  corresponds  to  processing  (adding) 
the  next  nnmber  from  the  list  and  then  immediately  ronnding  the  resnlt. 
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8.5  Performing  Multiple  Steps  Between  Abstractions 

An  abstract  interpretation  compntes  the  fixed  point  of  the  abstract  step  fnnction  5[P].  One 
can  write  this  fixed  point  as  the  limit  of  the  seqnence: 

4-1  = 

^2  =  =  (5[P1  o  5[P1)  ^0 

^3  =  5[P1^2  =  (5[P1o5[P1o5[P1)^0 


This  seqnence  first  adds  in  the  objects  (for  instance,  states  or  state  seqnences)  reachable  in  one 
step  from  abstracts,  adds  in  the  objects  reachable  in  the  next  step,  abstracts,  and  so  forth. 
By  the  definition  of  (S[P], 

5[Plo5[Plo5[Pl 

is  eqnivalent  to 

a  o  (S[P]  o  'j  o  ao  (S[P]  o  'j  o  ao  (S[P]  o  y. 

This  illnstrates  the  abstraction  (with  a)  at  every  step.  Bnt  in  Section  1.1  we  explained  that 
it  is  more  accnrate  to  defer  the  abstraction  for  a  few  steps.  Mathematically,  this  is  easy  to 
express:  simply  remove  the  occnrrences  of  7  o  a  dnring  the  desired  interval.  Thns, 

«o5[P1o5[P1o5[P1o7. 

performs  three  steps  before  abstracting,  and  conseqnently  may  yield  more  accnrate  resnlts  than 
applying  5[P]  three  times.  (The  formal  jnstification  of  this  is  in  [CC92d].)  As  we  explained 
above  and  in  Section  1.1,  this  increase  in  accnracy  can  be  striking.  This  techniqne  yielded 
better  sign  properties  in  onr  small  example  of  Section  1.1,  in  which  the  three  steps  were  the 
three  assignments  of  the  loop  body;  bnt  mnch  more  importantly,  any  analysis  of  properties 
that  might  be  temporarily  lost  dnring  execntion,  snch  as  data  shape  properties,  stands  to  gain 
from  this  techniqne.  This  class  of  analyses  is  qnite  large. 

Implementing  this  techniqne  wonld  seem  to  be  a  simple  engineering  issne:  jnst  remove 
the  selected  occnrrences  of  7  o  a.  This  is  an  illnsion,  however.  The  problem  is  that  the 
fnnction  (S[P]  is  specified  to  be  a  o  (S[P]  o  7,  bnt  is  never  implemented  that  way.  Indeed, 
it  is  not  possible  to  manipnlate  the  semantic  objects  (members  of  SemObj,  perhaps  sets  of 
states  or  state  seqnences  as  described  above)  themselves  becanse  they  are  nsnally  not  compnter- 
representable.  For  instance,  consider  the  sign-analysis  example.  It  is  not  possible  to  compnte  7, 
yielding  an  (almost  certainly)  infinite  set  of  states,  apply  5[P]  to  find  their  snccessor  states,  and 
abstract  the  resnlting  infinite  set  of  states.  Instead,  one  always  designs  a  monolithic  algorithm 
to  compnte  (S[P],  or  at  least  a  fnnction  above  (S[P]  in  its  pointwise  ordering,  along  with  a 
sonndness  proof.  Becanse  this  algorithm  is  cannot  be  separated  into  the  three  stages  of  7, 
(S[P],  and  a,  there  is  no  general  engineering  solntion  to  omit  the  compntation  of  7  o  a  between 
two  iterative  applications  of  the  algorithm.  We  give  an  example  to  illnstrate  this  in  the  next 
section. 
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One  might  attempt  to  attack  the  problem  from  the  different  angle  of  beginning  with  a 
semantics  that  nses  a  coarser-grained  step  fnnction,  snch  as 

s\p}  =  smosmosiPi 

Then  the  fnnction  =  a  o  5^[P]  o  7  specifies  an  analysis  that  abstracts  only  after  every 

third  execntion  step.  However,  this  different  line  of  attack  again  enconnters  a  barrier  in  practice. 
Althongh  5^[P]  is  certainly  a  reasonable  mathematical  fnnction,  any  algorithm  for  5^[P]  mnst 
in  general  be  able  to  handle  all  possible  combinations  of  three  adjacent  steps.  For  instance, 
consider  jnst  the  two  interesting  adjacent  steps  in  the  example  of  Section  1.1: 

y  :=x  -  3; 

X  :=y  +  5 

An  algorithm  that  combines  these  two  steps  wonld  have  to  recognize  the  special  pattern  oc- 
cnrring  here  that  preserves  the  property  that  x  is  positive,  and  this  pattern  wonld  have  to  be 
inclnded  explicitly  in  the  algorithm.  Again,  there  does  not  seem  to  be  a  general  approach,  or 
at  least  an  approach  that  is  combinatorially  reasonable  to  even  specify. 

To  nnderstand  this  difficnlty  fnrther,  consider  a  program  analysis  based  on  a  transition- 
system  semantics.  As  we  explained  in  Chapter  4  one  typically  defines  the  single-step  transition 
relation  1 — )■  with  meta-rnles  that  specify  how  the  individnal  pieces  of  program  syntax  indnce 
transitions.  For  instance,  the  semantics  of  PURE  inclnded  the  following  rnle  for  let-binding 
transitions. 

(let  X  =  e  in  t,p)  1 — ^  {t,  (7[x  1-^  £’[e]/?]) 

In  the  typical  approach  to  program-analysis  design,  one  wonld  “bake”  the  abstraction  into 
snch  a  rnle.  The  program  analysis  designer  wonld  hand-design  an  algorithm  that  “abstractly” 
performs  this  kind  of  transition.  For  instance,  if  SemObj  is  the  set  of  tables  of  sign  environments 
indexed  by  control  point,  as  described  above,  then  a  straightforward  algorithm  to  compnte  5[P] 
for  some  PURE  program  P  will  be  hard-wired  to  propagate  the  sign  property  of  expression  e 
at  control  point  (let  a:  =  e  in  t)  to  variable  x  at  control  point  t  for  each  let-binding  term  in 
P.  This  makes  intnitive  sense — the  algorithm  is  “abstractly  interpreting”  the  let-binding  steps. 
Bnt  of  conrse  the  analysis  designer  shonld  jnstify  these  intnitions  by  proving  that  the  algorithm 
for  (S[P]  actnally  implements  the  fnnction 

a  o  5[P]  o  y. 

Hence,  the  algorithm  never  directly  manipnlates  states  or  state  seqnences,  bnt  instead  per¬ 
forms  the  fnnction  (S[P]  in  one  go,  where  a  and  7  are  “baked  into”  the  transition  relation  1 — )■. 
Note  that: 

1.  To  apply  an  existing  analysis  to  a  different  langnage,  one  mnst  separately  hand-design  a 
new  algorithm  for  the  meta-rnles  of  that  langnage.  This  is  an  engineering  disadvantage. 
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2.  Because  the  abstraction  is  included  in  the  analysis  algorithm  and  cannot  be  separated 
as  a  single  module,  there  is  no  way  to  perform  multiple  execution  steps  abstracting  the 
result.  This  is  a  more  serious  disadvantage  because,  as  we  explained  in  Section  1.1,  it  can 
have  devastating  effects  on  the  quality  of  the  analysis. 

The  preceding  discussion  formalizes  the  intuition  behind  why  both  the  small  program  in 
Section  1.1  and  the  reverse  program  at  the  beginning  of  this  chapter  are  difficult  to  analyze 
accurately.  Our  solution  in  Section  1.1  was  to  change  the  program  itself,  rewriting  the  sequence 
of  instructions  in  the  loop  body  with  a  single  parallel  instruction.  In  that  way,  we  achieved  an 
effect  similar  to  the  5^[P]  idea  above.  Although  this  “compilation”  of  the  three  instructions 
into  a  single  instruction  was  at  the  time  for  expository  purposes,  we  now  have  the  semantic 
methodology  of  transfer  relations  as  a  general  solution. 


8.6  Multi-step  Abstract  Interpretation  with  Transfer  Relations 

In  previous  chapters,  we  demonstrated  that  our  language  of  transfer  relations  is  expressive 
enough  to  model  advanced  language  features  such  as  first-class  functions  and  mutable  data 
structures.  We  now  show  that  it  may  be  used  as  a  “back  end”  for  a  generalized  program- 
analysis  methodology  based  on  abstract  interpretation  in  which  multiple  program  steps  may  be 
assimilated  between  abstractions. 

In  Section  8.3  we  explained  that  a  common  choice  of  concrete  semantic  object  for  program 
analysis  is  a  state  set  (or  property).  As  we  described  in  Chapter  4,  in  semantic  methodology 
of  transfer  relations,  a  state  is  a  pair  of  a  control  point  and  a  store. 

State  =  Ctrl  Point  x  Store 

A  set  of  states  is  thus  isomorphic  to  a  function  from  control  point  C  to  the  set  (or  property)  of 
stores  occuring  in  states  at  C\ 

4/  G  SemObJ  =  P(State)  ^  CtrlPoint  — )•  P(Store) 

Let  CtrlPoint  be  the  finite  set  of  control  points  occuring  in  a  particular  program  P.  Given  a 
binary  relation  R,  let  [R]  =  \X.{y  \  x  ^  X  f\  xRy}.  Then 

5M4/C'  =  U[Ac,c']  (^C). 

c 

Intuitively,  the  set  of  stores  at  control  point  C  comes  from  the  stores  at  all  control  points  C  that 
might  precede  C'  by  one  step  in  an  execution,  or  in  other  words  by  one  link  in  a  control-flow 
graph  of  P.  But  now  we  can  express  multiple  steps  with  relation  composition.  For  instance 

52 [t]  4/  C"  =  (5M  o  sm)  4/  C"  =  U  [Ac,c'  o  Ac',c"]  (4-  C)  =  [J  [Ac,c'M  (4-  C). 

C,C'  C,C' 
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In  general,  becanse  the  set  of  control  points  of  a  program  is  finite,  we  need  only  design  a  join 
semilattice  Store  of  abstract  store  properties  and  an  abstraction  fnnction  a  G  P(Store)  — )■  Store, 
with  indnced  concretization  fnnction  7. 

4'  G  State  =  Ctrl  Point  — )■  Store 

If  a  is  additive^  then 

5M  ^  C' =  V  («  o  [Ac,c']  o  7)  (^c). 

c 

Bnt  now  we  may  perform  any  nnmber  of  steps  before  abstracting.  For  instance, 

52p]  ^  C"  =  V  («  °  [Ac,c',c"]  o  7)  C). 

C,C' 

Althongh  the  size  of  this  join  is  O(n^),  and  in  general  0{n^)  for  k  steps,  a  sensible  analysis 
wonld  only  do  this  in  cases  snch  as  straight-line  code,  where  it  is  known  beforehand  that  only 
one  control  path  yields  a  non-_L  transfer  relation. 

Thns,  an  analysis  rednces  to  implementing  a  o  [Ap]  o  7  for  any  control  path  ,  G  CtrlPoinf''. 
This  is  done  with  an  algorithm  S  that  describes  how  any  transfer  relation  maps  a  pre  abstract 
store  property  to  a  post  abstract  store  property. 

S  G  TR  — )■  Store  — )■  Store 

Then  S  is  a  fnnction  that,  given  a  transfer  relation  Ap  describing  control  path  ,  ,  describes  how 
a  store  property  at  the  control  point  at  the  beginning  of  ,  propagates  throngh  ,  and  yields  a 
store  property  at  the  end  of  ,  .  Conceptnally,  becanse  S  describes  the  exact  net  behavior  of  ,  , 
the  abstraction  step  only  occnrs  at  the  end  of  ,  ,  no  matter  how  long  ,  is. 

The  following  pictnre  describes  the  paradigm  of  mnlti-step  abstract  interpretation, 
control  point  store  property 

C  IE'  C  store  property  at  C  given  by  If 

,  execntion  throngh  control  path  C, ,  ,C' 

C  (S  A(7,r,c')(^  C)  store  property  at  C  after  propagation 

throngh  C, ,  ,C'  and  abstraction  at  C' 

Standard  abstract  interpretation  corresponds  to  the  case  in  which  ,  is  always  the  empty  path, 
and  so  the  propagation  is  throngh  a  single  step  from  C  to  C". 

As  we  have  described,  once  one  designs  the  abstract  store  Store,  the  heart  of  any  program 
analysis  defined  with  the  standard  methodology  of  abstract  interpretation  is  the  design  of  an 

^Usually,  a  is  additive,  but  otherwise  the  equality  is  a  property  implication  and  still  yields  a  correct  analysis. 
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algorithm  to  “abstractly”  interpret  each  meta-rule  of  the  transition  relation  i — )■  on  Store.  In 
our  methodology,  the  heart  is  the  design  of  the  S  function,  which  abstractly  interprets  the 
transfer  relations  in  our  language  TR  over  Store.  We  want  S  to  satisfy 

(a  o  [A]  o  y)  C  (S  A) 

where  C  is  pointwise  set  inclusion.  Ideally,  the  C  would  be  =,  but  an  analysis  may  always 
safely  weaken  the  properties.  Because  our  methodology  works  on  the  universal  intermediate 
representation  of  transfer  relations,  we  may  at  least  begin  to  describe  how  S  should  be  defined, 
independent  of  the  particular  source  language  or  analysis. 

We  assume  that  Store  is  a  join  semilattice  with  the  false  property  as  its  bottom  element, 
written  as  _L.  We  also  assume  that  there  is  a  function  in  Exp  — )■  Store  — )■  Store  that  given  an 
expression  e  and  store  property  a  returns  a  store  property  (written  el  a)  that  is  satisfied  by  all 
stores  that  both  satisfy  d  and  evaluate  e  to  true.  In  other  words, 

(cr  G  (f  A  e  ho-  true)  ^  cr  G  {eld). 

We  also  assume  that  there  is  a  similar  function  for  false: 

(cr  G  (f  A  e  ho-  false)  ^  cr  G  {eld) 


Note  that  the  definitions 

eld  =  eld  =  d 

trivially  satisfy  these  properties,  but  in  general  it  may  be  possible  to  do  better,  and  so  we 
provide  the  facility. 

Now  we  may  partially  define  S,  independent  of  the  particular  analysis  or  choice  of  Store. 


3?  A  I  A' 


SA_L  =  _L 
S  0  (f  =  _L 

d  =  (S  A  (e?(f))  V  (S  A' (e^(f)) 


The  only  remaining  case  is  for  assignment  relations.  Therefore,  we  have  the  following  “recipe” 
for  the  design  of  a  general  multi-step  abstract  interpretation  with  our  methodology. 


1.  Design  a  join  semilattice  Store  of  store  properties. 

2.  Define  eld  and  eid  or  use  the  degenerate  definitions  given  above. 

3.  Define  (S  5  d)  for  any  assignment  relation  5  G  ATR  and  store  property  d  G  Store  such  that 
((a  o  [^]  o  y)  (f)  C  (S  ^  ft). 


Then,  as  we  described  above,  one  may  perform  a  classical  abstract  interpretation  by  using  the 
single-step  transfer  relations  defined  by  the  semantics  of  the  source  language,  or  one  may  choose 
to  compose  these  transfer  relations  for  better  precision  over  selected  control  paths.  It  is  up  to 
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the  analysis  designer  to  pick  which  control  paths  are  of  interest,  bnt  we  snggest  a  strategy  of 
composing  all  paths  in  the  control-flow  graph  of  the  sonrce  program  in  which  only  the  first 
and  last  nodes  have  mnltiple  incoming  edges  (candidates  for  looping  points)  and  performing 
an  abstract  interpretation  to  compnte  the  properties  (via  fixed-point  iteration)  for  jnst  those 
nodes.  Snch  paths  ronghly  correspond  to  so-called  extended  basic  blocks  [ASU86].  Note  that 
snch  a  strategy  wonld  antomatically  compose  the  seqnence  of  three  assignment  statements  that 
posed  a  problem  in  the  example  at  the  beginning  of  this  chapter.  All  that  remains  to  solve  that 
example,  for  instance,  is  to  adapt  an  existing  store  analysis  snch  as  [GH96]. 


8.7  Value  Analysis 


In  this  section,  we  isolate  a  snbcase  of  onr  onr  methodology  for  value  analyses.  The  sign  analysis 
of  Section  1.1  and  Section  8.4  is  a  simple  kind  of  valne  analysis.  Recall  that  a  store  is  a  fnnction 
from  1- valne  to  valne.  In  a  valne  analysis,  a  store  property  is  defined  in  terms  of  a  set  Val  of 
valne  properties  (sets)  as  follows. 

a  G  Store  =  Lval  — )■  Val 

w  G  Lval  =  Var  U  (Val  X  Val) 

A  member  of  Lval  specifies  an  1- valne  property  (set)  as  follows 

V  v'  ^  v' 

X  ^  X  !  ,,  /.S.S/N 

{v.v')  G  {v.v') 


The  abstraction  fnnction 


a 


G  P (Store)  — )■  Store 


is  defined  as 

a'Ew  =  f\{v  G  Val  I  (cr  E  T,  A  w  E  w)  ^  {(7  w)  G  D} 
given  a  lattice  Val  of  valne  properties. 

In  onr  example  of  sign  analysis,  Val  =  Sign  as  given  in  Section  1.1,  and  we  assnmed  that 
the  only  1-valnes  were  variables  (in  other  words,  Lval  =  Var). 

The  main  algorithm  that  the  analysis  designer  mnst  provide  for  a  particnlar  valne  analysis 
is  a  fnnction 

P  G  Primop  — )■  Store  — )■  Val  — )■  Val 

that  “abstractly  evalnates”  primitive  operations.  It  mnst  satisfy  the  following  condition 


Condition  2  (Safety  of  P)  It  must  be  the  case  that 


/\  ViEVi  A{a  e  a)  ^  .  .  ,Vn)  v) 


,l<i<n 


implies  that 


V  G  P[p]cr(hi,...,t)„). 
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Note  that  the  store  parameter  of  P  may  be  ignored  for  context-insensitive  primitive  operations, 
which  will  typically  make  np  the  vast  majority  of  primitive  operations  in  any  application  of  onr 
analysis  methodology.  For  instance,  the  following  is  a  partial  definition  of  P  for  the  (context- 
independent)  operation  +  for  sign  analysis  (where  we  omit  the  nnnsed  store  parameter): 


P[+]  (pos,  int)  =  int 
P[+]  (pos,nonneg)  =  pos 
P[+]  (pos,  nonpos)  =  int 
P[+]  (pos,pos)  =  pos 
P[+]  (pos,  zero)  =  pos 
PW(pos,neg)  =  int 
P[+]  (pos,  none)  =  none 


We  mechanically  extend  this  notion  of  abstract  evalnation  to  expressions  and  1-expressions  as 

follows.  _  _ _ 

E  G  Exp  — )•  Store  — )■  Val 

L  G  Lexp  — >•  Store  — )•  Val 


Elxj  a 
E[p(ei,...,e„)]d- 


ax 

P[p]CT(E[ei]CT,  ...,E[e„]CT) 


^  L[a:]CT  = 
L[e.e']CT  = 

Lemma  10  For  all  e  G  Exp  and  v  G  Val, 


(E[eld).(E[e'ld) 


{a  E  a  A  e  v)  ^  V  E  E[e]  a, 


and  for  all  I  G  Lexp  and  w  G  Lval, 

(a  ^  a  Al  \-a  w)  ^  w  ^  Lp]  a. 


Proof:  Straightforward  indnction  based  on  the  safety  condition  of  P. 


□ 


The  next  step  in  onr  “recipe”  is  to  provide  the  two  boolean  filter  fnnctions,  which  are  defined 
as  follows. 


e?(T 

ela 


a  if  true  G  (E[e]  a) 
T  otherwise 

a  if  false  G  (E[e]  (t) 
T  otherwise 


The  final  step  is  to  provide  the  fnnction 
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that  describes  how,  given  an  assignment  relation  S,  one  store  property  evolves  into  another 
from  the  assignments  in  S.  To  do  this,  we  mnst  assnme  that  we  have  the  following  operations 

on  Val.  _ 

!t)  test  if  t)  G  Val  is  a  singleton  set 

t)  N  h'  test  of  nonempty  intersection  in  Val:  v  Civ'  ^  9 
We  extend  the  last  two  to  Lval  mechanically  as  follows. 

\x  (all  variables  are  singletons) 

!(t).h')  if  !h  A 


x'iA  X 
Vl-v'i  N  V2.v'2 


(all  variables  intersect  with  themselves) 
if  (hi  N  V2)  A  (h(  N  V2) 


Now  we  can  define  the  S  algorithm. 


ll,  .  .  .  ,ln  l-t  ei. 


aw 


E[ej]  a  if  (tc  N  w')  Alw  A  \w' 

where  w'  =  d 

(aw)  V  V{E[ei]  d  |  tc  N  Lpj]  d}  otherwise 


Some  important  kinds  of  analyses  are  not  valne  analyses.  A  good  example  is  the  shape 
analysis  with  which  we  began  this  chapter.  We  leave  the  adaption  of  snch  analyses  to  TR  as 
fntnre  work. 


Chapter  9 


Analyzing  Expressions 


In  Chapters  2  and  3,  we  gave  a  framework  for  designing  a  langnage  of  transfer  relations  pa¬ 
rameterized  by  a  set  Primop  of  primitive  operations,  and  we  gave  an  algorithm  to  compose  one 
transfer  relation  A  with  another  A'  to  get  a  third  transfer  relation  A"  =  A  ©  A'.  In  Chapter  4 
we  fnrther  described  a  semantic  methodology  in  which 

•  Ar  is  a  term  representing  the  net  behavior  (modeled  as  a  modification  to  a  store)  of  any 
segment  of  execntion  throngh  control  path  ,  (a  string  of  control  points), 

•  Ap/  is  a  term  representing  the  net  behavior  of  a  segment  corresponding  to  control  path 
,  ',  and 

•  if  ,  ends  with  the  same  control  point  with  which  ,  '  begins,  Ap^p/  =  Ap  ©  Ap/  represents 
the  net  behavior  of  the  first  execntion  segment  followed  by  the  second  execntion  segment. 
(Recall  that  a  control  path  ,1,(7  ending  with  control  point  C  may  be  combined  with  a 
control  path  (7, ,  2  beginning  with  C  with  the  ;  operation  as  (,  1,  (7);  ((7, ,  2)  =  ,  1,  (7, ,  2-) 

For  program  analysis,  composing  transfer  relations  thns  forms  the  core  of  reasoning  abont 
adjacent  pieces  of  code,  or  even  pieces  of  code  that  are  not  syntactically  adjacent,  bnt  that 
might  follow  each  other  in  an  execntion.  An  example  of  the  latter  is  a  piece  of  code  that 
leads  to  a  fnnction  call,  followed  by  a  piece  of  code  at  the  beginning  of  the  fnnction  body. 
We  formalized  these  concepts  in  Chapter  4,  which  presented  a  methodology  of  programming- 
langnage  semantics  based  on  transfer  relations.  This  methodology  provides  a  nniform  way  to 
compnte  the  transfer  relation  for  any  finite  control  path  in  a  program.  We  demonstrate  this 
methodology  in  Chapters  5,  6,  and  7  for  imperative  and  fnnctional  programming  langnages. 

In  this  chapter,  we  show  how  one  can  nse  this  methodology  to  analyze  how  the  valnes 
of  an  expression  change  dnring  program  execntion.  This  will  tnrn  tnrn  ont  to  be  relatively 
straightforward  for  fixed  finite  control  seqnences,  bnt  rather  snbtle  for  execntions  that  are 
infinite,  or  at  least  potentially  infinite. 
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9.1  Analyzing  Finite  Control  Paths 

Given  a  program  in  any  langnage  that  has  been  defined  nsing  the  semantic  methodology  of 
Chapter  4 — and  given  any  finite  control  path  ,  G  CtrlPoint"*",  one  can  compnte  a  transfer 
relation  Ap  G  TR  that  gives  a  description  of  how  a  store  at  the  start  of  ,  can  change  into  a 
store  at  the  end  of  ,  .  If  all  of  the  primitive  operations  in  the  langnage  are  deterministic,  as  is  the 
case  for  MlNl-C,  Pure,  and  Impure,  and  as  will  likely  be  the  case  for  any  reasonable  langnage, 
then  Ar  will  be  an  exact  description,  in  that  it  defines  the  semantics  of  the  control  path  ,  . 
In  Chapter  3,  we  called  this  a  translation.  If  the  langnage  inclndes  nondeterministic  primitive 
operations,  then  Ap  will  not  necessarily  be  an  exact  description,  bnt  it  will  be  a  snperset  of  the 
semantics  of  the  control  path  ,  ;  in  other  words,  all  possible  execntions  along  control  path  ,  will 
be  represented  in  Ap.  In  Chapter  3,  we  called  this  an  upper  approximation.  In  this  chapter,  we 
will  assnme  that  all  primitive  operations  in  the  sonrce  langnage  are  deterministic. 

Mnch  of  the  field  of  static  program  analysis  is  centered  aronnd  the  common  motivation  of 
analyzing  the  values  of  variables  or  expressions.  Usnally  this  is  done  with  a  fixed-point  compn- 
tation,  as  in  abstract  interpretation.  Bnt  in  this  chapter  we  present  an  alternate  approach. 

Snppose  that  one  wants  information  abont  the  valne  of  a:  G  Var  when  execntion  reaches  the 
end  of  control  path  ,  .  In  onr  methodology,  one  compntes 


E  x  Ar. 


The  resnlt  is  an  expression  e  snch  that  in  any  possible  execntion  fragment  throngh  control  path 
,  ,  e  at  the  beginning  of  that  execntion  fragment  is  semantically  eqnivalent  to  x  at  the  end  of 
that  execntion  fragment.  If  Primop  is  nondeterministic,  then  e  is  an  npper  approximation  of 
x,  in  that  it  is  gnaranteed  to  evalnate  before  the  execntion  fragment  to  any  valne  to  which  x 
evalnates  after  the  execntion  fragment.  In  general,  x  can  be  an  arbitrary  expression;  it  need 
not  be  a  variable.  In  other  words, 

Ee'  Ar 

is  an  expression  e  snch  that  in  any  possible  execntion  fragment  throngh  control  path  ,  ,  e  at  the 
beginning  of  that  execntion  fragment  is  semantically  eqnivalent  to  e'  at  the  end  of  that  execntion 
fragment.  These  properties  are  not  new;  they  are  simply  rephrasings  of  Theorem  2.  Bnt  becanse 
this  notion  of  expression  analysis  is  a  central  part  of  this  application  of  onr  methodology,  we 
introdnce  a  new  term. 

Definition  16  (Transfer  of  expressions)  If 

e\-fj  V  e  \-fji  V 

whenever  execution  from  store  a  G  Store  through  control  path  ,  G  Ctrl  Point"''  results  in  store 
a'  G  Store,  then  we  say  that  ,  transfers  e  to  e' . 

The  following  theorems  and  corollary  are  the  keystone  of  this  chapter. 


9.2  Analyzing  Adjacent  Loop  Iterations  via  Exponentiation 
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Theorem  5  For  any  programming  language  all  of  whose  primitive  operations  are  deterministic, 
if  Ay  is  the  transfer  relation  for  control  path  ,  and  (Ee' Ar)  =  e,  then  ,  transfers  e  to  e' . 

Proof:  Rephrasing  of  Theorem  2.  □ 

Theorem  6  If ,  transfers  e  to  e'  and  ,  '  transfers  e'  to  e"  then  ,  ] ,  '  transfers  e  to  e" . 

Proof:  Straightforward  from  the  definition  of  expression  transfer.  □ 

Corollary  2  For  any  programming  language  all  of  whose  primitive  operations  are  determinis¬ 
tic,  if  Ay  is  the  transfer  relation  for  control  path  ,  and  Ap'  is  the  transfer  relation  for  control 
path  ,  ' ,  and  if  (E  (E  e'  Ap/)  Ap)  =  e,  then  ,  '  translates  e  to  e' . 

9.2  Analyzing  Adjacent  Loop  Iterations  via  Exponentiation 

Consider  the  MlNl-C  program 

while  e  do  s 

that  repeats  the  execntion  of  the  code  s  nntil  e  becomes  trne.  Snppose  s  is  simply  a  piece 
of  straight-line  code,  and  ,  is  the  control  path  that  tests  if  e  is  trne  and  then  performs  s. 
As  described  in  Chapter  5,  one  can  antomatically  compnte  a  transfer  relation  Ap  for  ,  that 
represents  the  net  behavior  of  the  test  of  e  and  execntion  of  s.  So, 

•  Ap  represents  the  net  behavior  of  any  single  iteration, 

•  Ap  ©  Ap  represents  the  net  behavior  of  control  path  ,  ; ,  ,  which  is  the  control  path  of 
any  two  adjacent  iterations, 

•  Ap  ©  Ap  ©  Ap  represents  the  net  behavior  of  control  path  which  is  the  control 

path  of  any  three  adjacent  iterations, 

•  etc. 

We  adopt  the  notation  A”  to  mean 


and  the  notation  ,  ”  to  mean 


Then  Apn  =  (Ap)”. 

Snppose  that  one  wants  to  analyze  how  the  data  in  a  store  a  at  the  beginning  of  the  loop 
body  gets  nsed  and  npdated  over  a  period  of  three  iterations  of  the  loop — in  other  words,  dnring 
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any  segment  of  execntion  along  the  control  path  ,  Then  one  simply  compntes  (Ar)^,  which 
gives  a  description  of  the  store  three  iterations  later^  in  terms  of  a. 

It  is  worth  pointing  ont  that  this  is  a  fnndamentally  different — and  fnndamentally  advan- 
tageons — approach  from  those  of  standard  program  analyses.  Standard  approaches  cannot 
differentiate  between  an  infinite  nnmber  of  loop  iterations,  bnt  the  above  approach  can.  For 
instance,  snppose  the  loop  is 

while  X  <>  nil  do  x  :=  x.tl 

that  traverses  x  down  a  list  to  its  end.  The  semantic  methodology  in  Chapter  4  wonld  compnte 


Ar 


X  <>  nil? 


X  I— )■  x.tl 


to  describe  one  iteration  of  this  loop.  One  conld  then  compnte,  for  instance. 


(Ar)^  = 


to  describe  two  adjacent  iterations,  or 


(Ar)^  = 


X  <>  nil? 


- 1 

X.tl  <>  nil? 

x.tl.tl  <>  nil? 

X  1— )■  x.tl.tl.tl 

- ' 

to  describe  three  adjacent  iterations.^  The  last  transfer  relation  directly  provides  the  following 
information:  Dnring  any  three  adjacent  iterations  of  the  loop,  the  third  component  of  the  list  to 
which  X  is  bonnd  before  those  iterations  is  bonnd  to  x  after  those  iterations.  This  is  compnted 
and  formalized  by  the  E  algorithm;  one  compntes: 


Therefore: 


E  X  Ar 

Ex(Ar)2 

Ex(Ar)' 


x.tl 

x.tl.tl 

x.tl.tl.tl 


•  Whenever  the  loop  goes  throngh  any  one  iteration,  the  valne  of  x.tl  (gnaranteed  by 
Lemma  2  to  be  nniqne  becanse  all  primitive  operations  are  deterministic)  at  the  beginning 
of  the  iteration  is  eqnal  to  the  valne  of  x  at  the  end  of  the  iteration.  In  other  words,  , 
transfers  x.tl  to  x. 

•  Whenever  the  loop  goes  throngh  any  two  adjacent  iterations,  the  nniqne  valne  of  x.tl.tl 
at  the  beginning  of  the  iteration  is  eqnal  to  the  valne  of  x  at  the  end  of  the  iteration.  In 
other  words,  ,  ^  (which  is  ,  ; ,  )  transfers  x.tl.tl  to  x. 

^In  general,  one  might  want  to  examine  all  of  the  transfer  relations  that  accumulated  during  a  left-associative 
calculation  of  (Ar)^ — in  other  words,  the  transfer  relations  that  correspond  to  each  prefix  of  ,  These  inter¬ 
mediate  transfer  relations  give  descriptions  of  the  store  with  respect  to  a  at  all  of  the  intermediate  points  of 
execution  during  a  sequence  of  three  iterations,  as  described  in  Chapter  4. 

^  These  computations  assume  trivial  translations  C  and  P  that  merely  reconstruct  their  terms. 


9.3  The  Interaction  Between  Effects  and  Exponentiation 
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•  Whenever  the  loop  goes  through  any  three  adjacent  iterations,  the  value  of  x.tl.tl.tl 
(again,  guaranteed  to  be  unique)  at  the  beginning  of  the  iteration  is  equal  to  the  value  of 
X  at  the  end  of  the  iteration.  In  other  words,  ,  ^  (which  is  ,  ; ,  ; ,  )  transfers  x.tl.tl.tl  to 

X. 


In  this  sense,  our  methodology  provides  a  way  of  distinguishing  between  a  potentially  un¬ 
bounded  number  of  occurrences  of  the  same  control  path.  Even  if  the  length  of  the  list  to  which 
X  is  bound  at  the  entry  of  the  while  loop  is  unknown,  and  thus  unbounded,  the  above  transfer 
relation  provides  precise  information  about  how  the  binding  of  x  at  iteration  k  relates  to  the 
binding  of  x  at  iteration  k  +  3,  and  this  information  is  valid  for  any  k.  Most  approaches  to  pro¬ 
gram  analysis  that  are  based  on  fixed-point  calculation  would  ultimately  have  to  approximate 
the  data  structure  to  which  x  is  bound  at  the  entry  of  the  while  loop,  and  therefore  inherently 
cannot  produce  precise  information  for  an  unbounded  number  of  iterations. 


9.3  The  Interaction  Between  Effects  and  Exponentiation 


Above,  we  gave  some  intuition  about  how  to  analyze  the  value  of  x  in  the  loop 

while  X  <>  nil  do  x  :=  x.tl. 

First,  one  computes  the  transfer  relation 


A  =  X  <>  nil?  X  x.tl 


describing  one  iteration  of  the  loop.  Then,  one  can  compute 

E  X  A  =  x.tl 


to  automatically  determine  that  one  iteration  of  the  loop  transfers  x.tl  to  x.  Indeed,  one  can 
go  further  by  computing 


A^ 


A  ©  A  ©  A 


describing  three  adjacent  loop  iterations,  and  then 


Ex  A^  =  x.tl.tl.tl. 


to  yield  the  expected  result  that,  of  course,  if  x  traverses  one  element  down  the  list  in  one 
iteration,  then  it  must  traverse  three  elements  down  the  list  in  three  iterations.  Or  must  it?  In 
this  program,  it  is  indeed  the  case,  but  in  general  a  result  such  as 


E  X  A  =  x.tl 


144 


Analyzing  Expressions 


can  be  deceptive. 

To  see  why,  consider  the  following  MlNI-C  program. 

while  X  <>  nil  do 

{ 

y:=x; 

X  :=  x.tl; 
y.tl  :=  x.tl 

} 

One  wonld  think  that  this  program  is  designed  to  modify  the  list  bonnd  to  x  by  splitting  it 
into  two  lists — a  list  of  the  odd  elements  in  order  and  a  list  of  the  even  elements  in  order.  The 
following  transfer  relation  describes  one  iteration  of  the  loop. 

A  = 

Now,  snppose  that  one  wants  to  analyze  the  valne  of  x  in  this  loop.  As  for  the  previons  program, 
one  conld  compnte 

E  X  A  =  x.tl 

to  antomatically  determine  that  one  iteration  of  the  loop  transfers  x.tl  to  x;  in  other  words,  x 
progresses  to  its  next  element  in  one  iteration.  So  far,  this  looks  jnst  like  the  previons  program. 
Going  fnrther,  one  conld  examine  the  valne  of  x  after  two  iterations: 

E  X  A^  =  x.tl.tl 

This  may  not  seem  very  snrprising.  After  all,  if  x  moves  down  one  element  of  the  list  in  one 
iteration,  then  it  seems  reasonable  that  it  moves  down  two  elements  in  two  iterations.  However, 
the  pattern  is  broken  with  the  next  iteration: 

ExA^  =  if(x.tl.tl  =  X, 
x.tl.tl, 
x.tl.tl.tl) 

In  other  words,  snppose  that  the  list  is  a  two-element  circnlar  list  with  elements  and  V2,  and 
X  is  bonnd  to  r;i.  Then: 

•  After  one  iteration,  x  is  bonnd  to  V2  (i.e.,  element  {2, 4,  6, . . .}  of  the  list). 

•  After  two  iterations,  x  is  bonnd  to  (i.e.,  element  {1,  3,  5, . . .}  of  the  list). 

•  After  three  iterations,  x  is  still  bonnd  to  r;i. 

Otherwise,  no  matter  what  other  kind  of  circnlarity  or  aliasing  may  be  present  in  the  list,  x 
progresses  to  the  fonrth  element  in  its  linked  strnctnre  after  three  iterations. 


X  <>  nil?  y,  X,  x.tl  I— )■  X,  x.tl,  x.tl.tl 


9.4  Blowup  of  Conditional  Expressions 
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Note  that  the  those  two  possible  behaviors  are  truly  distinct.  To  illustrate,  consider  how 
each  of  the  two  cases  appear  in  a  store  graph.  The  first  case  is 


■-yi 


ti 

’TT" 


:V2 


where  may  or  may  not  be  equal  to  V2.  The  second  case  is 


X  tl  tl 

• - -Vi  - ^V2-  -- 


-Vs  ■ 


V4 


where  we  impose  only  that  and  vs  are  nonequal  (and  thus  and  V2  must  be  nonequal). 
After  three  iterations,  x  points  to  in  the  first  case  and  V4  in  the  second  case.  But  there  is  no 
single  non- conditional  expression  that  works  for  both  cases.  The  expression 

x.tl.tl 

works  for  the  first  case,  but  not  the  second;  the  expression 

x.tl.tl.tl 


works  for  the  second  case,  but  not  the  first.  If  V4  is  equal  to  r;i,  then  the  expression  x  would 
work  for  both  cases,  but  V4  may  not  be  equal  to  r;i.  The  E  algorithm  automatically  distinguishes 
the  cases  that  need  to  be  distinguished  and  builds  a  conditional  expression  that  covers  all  cases. 


9.4  Blowup  of  Conditional  Expressions 

These  conditional  expressions  can  become  large.  Let 

n  times 

X.tl”  =  X.  tl . tl 

After  four  iterations  of  the  loop  in  the  previous  section,  we  have 

ExA^  =  if(e  =  x.tl, 

e, 

if  (e  =  X,  x.tl^,  e.tl)) 

where 

e  =  if(x.tl^  =  X,  x.tl^,  x.tl^). 

Unlike  the  expression  (Ex  A^)  for  three  iterations,  this  expression  is  rather  complex  for  human 
understanding  (although  still  much  easier  than  hand-generating  all  initial  aliasing  conditions 
that  might  be  relevant  and  hand-executing  four  iterations  of  the  loop  under  all  such  cases). 
Some  study  uncovers  the  following  interpretation  for  the  value  of  x  after  four  iterations: 

•  If  the  second  element  points  to  the  first  (special  case:  the  first  element  points  to  itself), 
then  the  value  is  x. 
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•  Otherwise,  if  the  third  element  points  to  the  second  (special  case:  the  second  element 
points  to  itself),  then  the  value  is  x.tl. 

•  Otherwise,  if  the  third  element  points  to  the  first,  then  the  value  is  x.tl^. 

•  Otherwise,  the  value  is  x.tl^. 

Again,  these  are  all  distinct  behaviors,  and  it  is  probable  that  the  above  itemized  list  is  the 
shortest  description  of  the  value  of  x  after  four  iterations.  But  clearly  the  expression  computed 
as  (Ex  A^)  is  bigger  than  this  itemized  list.  A  better  symbolic  evaluation  of  if  could  reduce  its 
size.  For  instance  note  that  as  long  as  all  primitive  operations  are  deterministic,  the  following 
two  expressions  are  semantically  equivalent: 


if(ei  =  62,61,63)  =  if  (61  =  62,62,63) 


Therefore,  if  can  substitute  one  for  the  other,  and  for  instance  choose  the  small  x.tl  over  the 
large  e  above,  yielding  instead: 

ExA^  =  if(6  =  x.tl, 
x.tl, 

if  (6  =  X,  x.tl^,  6.tl)) 

This  is  better,  but  not  by  much.  The  key  is  to  distribute  the  conditional  expression  e  nested 
in  the  condition  position  of  (Ex  A^)  over  the  two  branches  of  the  latter.  For  this,  we  have  the 
following  rule  of  semantic  equivalence  of  expressions,  where  eE  denotes  any  expression  that  can 
be  derived  from  e  by  optionally  replacing  occurrences  of  any  subexpression  61  G  in  6  by  some 
other  expression  62  G  E: 


6  =  if(6i,62,63) 
c'g  =  65{6,62,64}  c'g  =  Cejc,  62} 

_ 65  =  65(6,  63,  64}  e'l  =  66(6,  63} _ 

if  (6  =  64,65,65)  =  if  (61,  if  (62  =  64,6'5,6'g),if(63  =  64 ,  6'5' ,  6'g' )  ) 

As  written,  this  rule  is  nondeterministic  because  there  are  in  general  many  choices  for  an 
expression  eE.  But  one  obvious  strategy  is  simply  to  pick  the  smallest  expression.  This  is  easy 
to  implement.  For  instance  to  compute  65,  pick  the  smallest  of  {6,62,64}  and  substitute  all 
occurrences  in  65  of  the  two  larger  expressions  by  this  small  expression. 

Applying  this  rule  to  (Ex  A^)  yields: 

Ex  A'^  =  if(x.tl2  =  X, 

if(x.tl^  =  x.tl,  x.tl,  x.tl^), 
if  (x.tl^  =  x.tl, 
x.tl, 

if(x.tl3  =  X,X.tl2,X.tl4))) 


9.4  Blowup  of  Conditional  Expressions 
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This  is  close  to  optimal,  but  it  is  possible  to  simplify  the  first  arm  even  further  using  the 
following  generalization  of  the  first  semantic  equivalence  above: 

_ 63  =  63(61,62} _  .g  2) 

if(ei  =  62,63,64)  =  if  (61  =  62,6'3,64) 

Applying  this  rule  to  the  above  expression  yields: 

ExA^  =  if(x.tl2  =  x, 

if  (x  =  x.tl,  x.tl,  x), 
if  (x.tl^  =  x.tl, 
x.tl, 

if  (x.tl^  =  X,  x.tl^,  x.tl^))) 

Applying  the  rule  again  to  the  conditional  in  the  first  arm  yields: 

Ex  =  if(x.tl^  =  X, 

if  (x  =  x.tl,  X,  x), 
if  (x.tl^  =  x.tl, 
x.tl, 

if  (x.tl^  =  X,  x.tl^,  x.tl^))) 


Finally,  we  apply  the  equivalence  that 


if  (61  =  62,  6,  6)  =  6 


(9.3) 


to  yield: 

Ex  A^  =  if(x.tl^  =  X, 

X, 

if  (x.tl^  =  x.tl, 
x.tl, 

if  (x.tl^  =  X,  x.tl^,  x.tl^))) 

This  last  expression  directly  corresponds  to  the  bullet  list  above,  and  no  more  simplifications 
seem  possible;  the  behavior  of  four  iterations  of  the  loop  on  the  binding  of  x  is  inherently  this 
complex. 

Rule  9.3  is  subtle;  it  works  only  because  the  primitive  operation  =  returns  either  true  or 
false  on  any  pair  of  values,  even  rmdef . 

Rule  9.1  of  semantic  equivalence  may  seem  ad  hoc,  but  it  is  actually  more  widely  applicable 
than  it  may  seem  at  first.  Let  us  look  again  at  the  computation  of  (ExA^),  but  this  time  with 
the  observation  that  (ExA^)  appears  in  some  of  its  subexpressions. 

ExA^  =  if((ExA3)  =x.tl, 

(ExA^), 

if((ExA3)  =  x,x.tl2,(ExA3).tl)) 
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Now  note  that  the  nested  conditional  in  Rnle  9.1  occnrs  exactly  where  (Ex  A^)  appears  above. 
This  is  not  an  accident;  often,  the  behavior  of  a  piece  of  code  (e.g.,  one  iteration  of  a  loop) 
on  an  expression  (e.g.,  a  variable)  will  fnnction  in  more  than  one  possible  way  depending  on 
properties  (e.g.,  aliasing)  of  the  resnlt  of  the  preceding  piece  of  code  (e.g.,  the  previons  loop 
iteration)  on  that  expression.  Hence,  it  is  often  the  case  that  (Ee  A)  will  appear  in  the  proper 
position  (i.e.,  as  e)  in  an  application  of  Rnle  9.1  to  (Ee(A  ©  A')).  The  rnle  thns  serves  to 
incrementally  keep  the  expressions  as  flat  as  possible. 
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The  previons  sections  showed  how  to  antomatically  compnte  that  a  single  iteration  of  the  loop 

while  X  <>  nil  do  x  :=  x.tl 


and  a  single  iteration  of  the  loop 


while  X  <>  nil  do 

{ 

y:=x; 

X  :=  x.tl; 
y.tl  :=  x.tl 

} 


both  transfer  x.tl  to  x. 

It  is  not  difficnlt  to  see  that  for  any  n,  n  iterations  of  the  first  loop  transfer  x.tl”  to  x,  bnt 
we  did  not  give  an  algorithm  to  compnte  this  closed-form  solntion.  However,  we  demonstrated 
that  three  iterations  of  the  second  loop  do  not  transfer  x.tl^  to  x,  and  therefore  it  is  not  the  case 
that  for  any  n,  n  iterations  of  the  second  loop  transfer  x.tl”  to  x.  This  section  addresses  the 
qnestion  of  when  snch  exponentiations  are  valid,  and  how  to  antomatically  compnte  a  closed- 
form  representation  for  those  exponentiations.  The  resnlts  that  we  will  achieve  are  mnch  more 
general  than  the  simple  traversal  of  a  linear  data  strnctnre. 


9.5.1  An  example 

We  begin  at  an  intnitive  level  by  examining  why  the  closed-form  exponentiation  works  for  the 
first  loop,  bnt  not  for  the  second.  Consider  two  adjacent  iterations  of  the  first  loop.  We  know 
that  iteration  1  transfers  x.tl  to  x  and  iteration  2  transfers  x.tl  to  x.  Bnt  to  link  iteration 
1  with  iteration  2,  the  “ontpnt  expression”  of  iteration  1  shonld  be  the  same  as  the  “inpnt 
expression”  of  iteration  2.  So  what  we  really  need  is  to  compnte 


E  (x.tl)  A  =  x.tl.tl 
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which  reports  that  any  one  iteration  (where  A  is  the  transfer  relation  for  a  single  iteration) 
transfers  x.tl.tl  to  x.tl.  Now  we  know  that  iteration  1  transfers  x.tl.tl  to  x.tl,  which  is  then 
transferred  by  iteration  2  to  x.  This  is  an  application  of  Corollary  2. 

Thns,  we  have  simply  verified  that  two  iterations  transfer  x.tl.tl  to  x,  a  fact  that  is  ex¬ 
pressed  more  directly  by 

E  X  =  x.tl.tl. 

Bnt  deriving  that  resnlt  in  the  two  steps  of  Corollary  2  instead  of  compnting  it  directly  snggests 
an  approach  for  deriving  a  closed-form  solntion  for  n  steps.  Note  that  in  the  eqnation 

E  (x.tl)  A  =  x.tl.tl 

the  expression  on  the  right  is  a  dereference  of  x.tl  by  tl,  the  expression  on  the  left  is  a 
dereference  of  x  by  tl,  and  we  already  have  that  one  iteration  transfers  x.tl  to  x.  Snppose 
an  oracle  magically  provides  the  statement  that  for  all  expressions  e  and  e' ,  if  one  iteration 
transfers  e  to  e',  then  it  mnst  be  the  case  that  it  also  transfers  e.tl  to  e'.tl.  In  other  words, 

Ee^A  =  e  ^  Ee^tlA  =  e.tl. 

Then  by  indnction,  we  have 

ExA  =  x.tl 
Ex.tlA  =  x.tl^ 

Ex.tl^A  =  x.tl^ 

Ex.tl^A  =  x.tl'^ 


And  then  by  another  indnction,  we  have 
ExA  =  x.tl  base  case 

Ex  A^  =  x.tl^  Corollary  2  with  above  line  and  line  2  of  previons  resnlt 

Ex  A^  =  x.tl^  Corollary  2  with  above  line  and  line  3  of  previons  resnlt 

ExA^  =  x.tl^  Corollary  2  with  above  line  and  line  4  of  previons  resnlt 


The  key  to  this  approach  is  the  oracle  that  provides  the  statement  that 

Ee^A  =  e  ^  E  (e^tl)  A  =  e.tl. 

Now  it  becomes  clear  why  the  closed-form  exponentiation  works  for  the  first  program,  bnt  not 
for  the  second.  The  intnition  of  this  statement  is  that  “the  tl  fields  of  all  data  strnctnres 
are  preserved  by  a  single  iteration”.  This  is  clearly  trne  of  the  first  program,  which  does  not 
perform  any  assignments  to  tl  fields.  Bnt  the  second  program  inclndes  the  statement 

y.tl  :=  x.tl 


base  case 

application  of  oracle  to  above 
application  of  oracle  to  above 
application  of  oracle  to  above 
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which  potentially  alters  the  tl  field  of  some  value  in  the  store.  And  indeed,  the  oracle’s 
statement  is  false  when  A  is  the  transfer  function  of  one  iteration  of  the  second  program.  At 
first,  it  seems  tricky  to  find  an  expression  e'  that  makes  the  statement  fail.  Neither  x  nor  y 
serves  the  purpose,  but  x.tl  does: 

E  (x.tl)  A  =  x.tl.tl 


but 


E  (x.tl.tl)  A  =  if  (x.tl.tl  =  X,  . 

x.tl.tl, 

x.tl.tl.tl) 


Fortunately,  however,  there  is  a  general  technique  for  testing  if  the  above  oracle  statement 
holds.  The  insight  is  that  an  expression  that  cannot  possibly  be  altered  by  the  program  can 
act  as  a  “probe”  into  any  point  in  the  store.  So  one  needs  merely  to  choose  e  to  be  a  variable 
X  that  does  not  appear  in  the  program.  It  will  always  be  the  case  that  (Ea:A)  =  a:,  and  if 
the  oracle  statement  fails  for  any  ^  then  it  will  fail  for  x.  Furthermore,  if  the  oracle  statement 
passes  for  e  =  e'  =  a:  then  it  will  pass  for  all  expressions  e  and  e'.  For  instance,  for  both  of  our 
programs, 

Ez  A  =  z, 

but  while 

E  (z.tl)  A  =  z.tl 

for  the  first  program,  thus  implying  that  the  oracle  statement  holds  and  thus  the  closed-form 
exponentiation  is  valid, 

E  (z.tl)  A  =  if  (z  =  X, 

x.tl.tl, 

z.tl) 

for  the  second  program,  thus  demonstrating  that  the  oracle  statement  fails  and  thus  the  closed- 
form  exponentiation  is  not  valid. 

The  above  is  merely  an  example  of  exponentiating  a  tl  dereference.  Now  we  generalize 
these  results  to  a  much  larger  class  of  exponentiations. 


9.5.2  Expression  constructors 

We  begin  with  the  observation  that  x.tl”  is  the  result  of  n  repeated  applications  of  the  function 

Ae.  (e.tl) 

to  the  expression  x.  Exponentiating  a  dereference  is  thus  a  special  case  of  exponentiating  a 
function  of  type 

Exp  — )■  Exp. 

In  this  section,  we  present  a  foundation  for  these  kinds  of  functions  and  how,  given  a  loop,  to 
automatically  find  such  functions  that  can  be  exponentiated  in  the  loop. 
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Definition  17  The  set  ExpCori;.  o/ expression  constructors  of  arity  k  is  defined  as  follows. 

£  G  ExpCori;,  ::=  x  \  p{£i, . . .  ,£n)  \  Q  |  . . .  |  © 

ExpCori;.  is  isomorphic  to  Exp*  — )■  Exp,  and  these  may  be  used  interchangeably. 

Intuitively,  a  k-sxy  expression  constructor  is  an  expression  in  which  “holes”  may  appear,  each 
hole  labeled  with  a  number  from  1  to  A:.  A  hole  may  appear  multiple  times  or  not  at  all.  When 
a  k-siy  expression  constructor  is  applied  to  k  expressions  (ei, . . .  e^),  then  each  occurrence  of 
hole  0  is  “filled”  with  Cj.  Note  that  ExpCong,  the  set  of  nullary  expression  constructors,  is 
just  the  set  Exp  of  expressions.  Also  note  that  ExpCon  k  ^  Exp  for  any  k  because  an  expression 
constructor  need  not  contain  any  holes. 

The  definition  of  k-axj  expression  constructors  as  Exp*  — )■  Exp  functions  is  as  follows: 

xe  =  X 

p{£i,...,£n)e  =  p{£ie,...,£ne) 

0  (ei, . . . ,  e/j)  =  Ci 


Unary  expression  constructors  are  especially  important  because  they  are  the  only  ones  that 
can  be  exponentiated,  as  they  are  the  only  ones  with  matching  domain  and  codomain.  Because 
they  are  distinguished,  we  simply  call  them  expression  constructors  and  use  slightly  specialized 
notation  for  them: 


£  G  ExpCon  ::=  x\  p(£’i, . . .  ,£n)  \  0  unary  expression  constructors 

We  also  specialize  the  definition  above  for  the  case  of  expression  constructors  as  Exp  — )•  Exp 
functions: 


X  e 

p{£i,  ...,£n)e 

Oe 


X 

p{£ie,. . .  ,£ne) 
e 


In  the  previous  section,  we  considered  loops  in  which  one  iteration  transfers  x.tl  to  x.  We 
started  by  calculating 


E  X  A  =  x.tl. 


and  then  computing  a  test  to  determine  if  the  “.tl”  part  could  be  exponentiated.  The  discussion 
in  the  previous  section  generalizes  elegantly  via  the  following  three  theorems. 


Theorem  7  (Abstraction  of  expression  transfer)  Let  Ap  be  the  transfer  relation  for  con¬ 
trol  path  ,  ,  £  and  £'  he  k-ary  expression  constructors,  and  xi, . . .  ,Xk  he  variables  that  do  not 
appear  in  either  the  syntax  of  Ay,  £,  or  £' .  If 


transfers  £{xi, ...  ,xifij  to  £' {xi, . . .  ,xifij,  and 
transfers  e*  to  e*  for  all  i  G  {1, . . . ,  A:} 
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then,  transfers  S{ei, ...  ,ek)  to  S' {ei, . . .  ,eifj. 

Proof:  We  need  to  show  that  whenever  cr  Ap  cr', 

{S{ei,...,ek))'raV  {S' {ei, . . .  ,ek))  v. 

Choose  any  valnes  vi, ...  ,Vk.  Becanse  ,  transfers  e*  to  e*  for  alH  G  {1, . . . ,  A:},  we  have  that 

k  k 

Aei^aVi  ^  A 
i=l  i=l 

Now  let 

a"  =  a[xi  -yi] . . .  [xk  H-  Vk] 
a'"  =  a'[xi  -yi] . . .  [xk  H-  Vk] 

Because  none  of  rci, . . . ,  appears  in  the  syntax  of  Ap,  we  have  that 

//  A  m 
a  Ar  (7 


and  hence,  because  ,  transfers  £{xi^ . . .  ^Xk)  to  £^{xi^ . . .  ^Xk)-^  that 

{S{xi,...,Xk))'run  V  {S'{xi,...,Xk))'run,  V. 

Combining  the  above,  we  have  that 

^  A  '“cr  A  {S{xi,  .  .  .,Xk))  her"  V  ^  A  '“cr'  A  (S'(xi,  .  .  .  ,  Xf,))  her'"  V. 


Bnt  becanse  none  of  xi, . . . ,  x^  appears  in  S  we  have  that  for  any  a, 


3vi,  ...,Vk. 


A  e*  A  {S{xi, . . .  ,Xk))  h 

a[xi^vi]...[xk^Vk] 

Vi=i  / 


iS{ei,. . .  ,e„))  her  V 


and  similarly  for  S' .  Therefore, 

iS{ei,. .  .  ,e„))  her  V 


{S'{ei, .  .  .  ,e„))  her'  V. 


□ 


The  above  theorem  is  nsed  primarily  for  the  next  two  theorems,  which  we  will  nse  to  compnte 
antomatically  closed  solntions  in  loops. 


Theorem  8  (Left  closed-form  exponentiation)  Given  a  language  in  which  all  primitive 
operations  are  deterministic,  let  Ar  be  the  transfer  relation  for  control  path  ,  ,  S  be  a  (unary) 
expression  constructor,  and  x  be  some  variable  that  does  not  appear  in  either  the  syntax  of  Ar 
or  S.  If 

E  e  Ar  =  S  e 


and 

E  {S  x)  Ar  =  S  X 

then  for  all  n  >  0  and  k  >  0,  ,  "  transfers  (f  (”+*)  e)  to  (£’*  e). 
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Proof:  Straightforward  application  of  Theorems  5,  6,  and  6abs-exptr.  □ 

Theorem  9  (Right  closed-form  exponentiation)  Given  a  language  in  which  all  primitive 
operations  are  deterministic,  let  Ar  be  the  transfer  relation  for  control  path  ,,  6  be  a  (unary) 
expression  constructor,  and  x  be  some  variable  that  does  not  appear  in  either  the  syntax  of  Ar 
or  S.  If 

E{£e)Ar  =  e 

and 

E  {£  x)  Ar  =  £  X 

then  for  all  n  >  0  and  k  >  0,  ,  transfers  (£’*  e)  to  (f  (”+*)  e). 

Proof:  Straightforward  application  of  Theorems  5,  6,  and  6abs-exptr.  □ 

The  following  example  illnstrates  the  above  development. 

Example  30  Let  A  be  the  transfer  relation  for  one  iteration  of  the  loop: 

while  X  <>  nil  do  x  :=  x.tl 

Note  that  variable  z  does  not  appear  in  A.  Let  £  =  O-'*'!  compute 

E  X  A  =  x.tl  =  £x 

and 

E  (£’  z)  A  =  E  (z.tl)  A  =  z.tl  =  £  z. 

By  Theorem  8,  we  conclude  that  any  n  iterations  of  the  loop  transfers  x  =  x.tl”  to  x,  and 
further  that  it  transfers  x.tl”"*"*  to  x.tl*  for  any  k  >  0. 

9.5.3  Computing  closed  forms  automatically 

In  order  to  compnte  these  closed  forms  antomatically  for  expression  e  and  control  path  ,  ,  it  is 
necessary  to  determine  antomatically  an  expression  constrnctor  £  snch  that 

E  e  Ar  =  £  e 

where  Ar  is  the  transfer  relation  for  control  path  ,  .  In  general,  there  may  be  many  choices  for 
£  that  make  this  eqnation  trne.  For  instance,  for  both  of  onr  example  while-loops  above, 

E  X  A  =  (O-tl)  X 


and 


E  X  A  =  (x.tl)  X. 
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To  understand  this  nondeterminism,  consider  the  task  of  determining  from  any  two  expressions 
e  and  e'  an  expression  constructor  £’|,  such  that  £’|,  e  =  e' .  Using  the  first  two  lines  of  the  above 
definition  of  expression  constructors  as  functions,  we  can  derive  the  following  scheme: 

£%  =  X 

But  this  merely  reduces  to  the  degenerate  £’|,  =  e',  an  expression  constructor  without  any  holes. 
Taking  into  consideration  the  third  line  of  the  definition,  we  can  add  the  following  equation: 

=  o 

Now  whenever  ^  matches  e,  we  have  a  choice  between  applying  this  equation  to  introduce  a 
hole  or  use  the  first  two  to  reconstruct  e. 

One  obvious  deterministic  strategy  is  to  introduce  holes  whenever  possible,  yielding  the 
following  algorithm: 

f  O  if  e  =  e' 

E-l;  =  \  X  li  e  ^  e'  =  X 

[  >  •  •  •  >  if  e  7^  e'  =  p(ei, . . . ,  e„) 

Example  31  The  expression- constructor  algorithm  computes: 

cx  _  cx  cx 

“^x.tl”  ‘-'■a  -^tl 

=  0-ti 

Example  32  The  expression- constructor  algorithm  computes: 


=  a[0  +  01 


This  suggests  the  following  algorithm  for  computing  closed  forms  of  expression  transfer  in 
loops. 

Algorithm  1  (Closed-forms  of  expressions  in  loops)  Given  the  following  input: 

•  A  control  path  ,  G  CtrlPoinf''. 

•  An  expression  e  G  Exp. 

Perform  the  following  steps: 

1.  Compute  the  transfer  relation  Ar  for  control  path  ,  as  described  in  Chapter  f. 

2.  Compute  the  expression  (EeAr).  Call  this  e' . 
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3.  Compute  the  expression  constructor  £’|,  as  described  above.  Call  this  Sl- 

4-  Compute  the  expression  constructor  as  described  above.  Call  this  £r. 

5.  Choose  a  variable  x  not  appearing  in  Ap. 

6.  Compute  the  expression  {£lx)  as  described  by  the  definition  of  expression  constructors. 
Call  this  ei. 

1.  Compute  the  expression  (£rx)  as  described  by  the  definition  of  expression  constructors. 
Call  this  eR. 

8.  Compute  the  expression  (Ee/,  Ar)  and  test  if  it  is  .syntactically  equal  to  e^.  If  so,  output 

'left(£’i)”.  Otherwise,  output  'left-exponentiation  not  found”. 

9.  Compute  the  expression  (EeijAr)  and  test  if  it  is  .syntactically  equal  to  cr.  If  so,  output 

'l:ight(£’ij,  e')  Otherwise,  output  'Iright-exponentiation  not  found”. 

If  this  algorithm  given  ,  and  e  ontpnts  “left(£’i)”  then  for  all  n  >  0  and  k  >  0,  ,  transfers 

[E^n+k)  g^  addition,  if  it  ontpnts  “right  e')”  then  for  all  n  >  0  and  A:  7^  0,  ,  ” 

transfers  {£^  e')  to  e'). 

Example  33  The  above  algorithm,  given  the  control  path  corresponding  to  one  iteration  of  the 
program 

while  X  <>  nil  do  x  :=  x.tl 

and  given  the  expression  x,  outputs  'left(O-tl)  ”  and  'Iright-exponentiation  not  found”. 

Example  34  The  above  algorithm,  given  the  control  path  corresponding  to  one  iteration  of  the 
program 

while  X  <>  nil  do 

{ 

y:=x; 

X  :=  x.tl; 
y.tl  :=  x.tl 

} 

and  given  the  expression  x,  outputs  'left-exponentiation  not  found”  and  'Iright- exponen¬ 
tiation  not  found”. 


Part  V 
Conclusion 
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In  this  dissertation,  we  have  presented  a  new  way  of  approaching  the  problem  of  statically 
analyzing  a  program  to  determine  properties  of  its  rnn-time  behavior. 

In  onr  methodology,  the  semantic  definition  of  a  langnage  is  given  by  a  translation  from 
the  sonrce  program  to  an  intermediate  form  in  which  all  single  step  transitions  between  two 
control  points  are  described  by  a  single  transfer  relation  term  in  TR.  For  instance,  we  have 
shown  how  to  translate  assignments  and  let-bindings  into  assignment  relations,  conditionals 
into  filter  relations,  allocation  into  assignment  relations  that  maintain  an  explicit  heap  pointer, 
and  fnnction  calls  into  filter  relations  with  assignment  for  argnment  passing.  Onr  langnage  TR 
if  transfer  relations  is  thns  a  nniversal  intermediate  representation  for  programming  langnages, 
parameterized  by  a  set  Val  of  valnes  and  Primop  of  primitive  operations. 

The  semantics  itself  merely  defines  the  single-step  transfer  relations,  which  amonnts  to  a 
translation  of  the  sonrce  program  into  TR.  Bnt  the  fnndamental  property  of  TR  that  sets  it 
apart  from  other  intermediate  representations  and  makes  it  nsefnl  for  program  analysis  is  that 
it  is  closed  nnder  composition.  We  have  given  an  algorithm  ©  to  perform  this  composition. 

Given  this  view,  one  way  to  think  of  onr  analysis  methodology  is  as  a  kind  of  symbolic 
execution.  Given  the  translation  of  a  sonrce  program  into  TR,  one  nses  ©  to  compose  the 
steps  in  order  to  generate  a  transfer  relation  (term  in  TR)  of  a  particnlar  finite  control  path. 
The  single-step  transfer  relations  yielded  by  the  semantics  correspond  to  the  length-two  control 
paths  and  are  simply  a  rewriting  of  the  program  text.  Bnt  as  an  analysis  composes  these  steps 
with  ©,  it  symbolically  nncovers  more  and  more  dynamic  information  abont  the  program. 

Unlike  nsnal  approaches  to  program  analysis  that  begin  by  defining  an  abstract  langnage  of 
rnn-time  properties,  onr  methodology  never  discards  information  abont  the  program.  In  fact, 
given  a  closed  program  (in  other  words,  no  parameters  or  nnknown  data),  it  is  possible  actnally 
to  execnte  the  program  with  ©.  To  do  this,  bnild  the  transfer  relation  for  the  control  path  that 
starts  at  the  beginning  of  the  program.  At  each  point  dnring  this  incremental  composition, 
all  information  abont  the  rnn-time  state  np  to  that  point  will  be  represented  precisely  in  the 
transfer  relation,  and  every  time  a  branch  point  is  reached  (for  instance,  conditional  or  fnnc¬ 
tion  call),  only  one  branch  will  resnlt  in  a  non-0  transfer  relation.  Of  conrse,  if  the  program 
never  terminates  then  this  process  will  never  terminate.  Bnt  it  demonstrates  that  onr  analy¬ 
sis  methodology  inclndes  all  information  needed  to  perform  a  precise  execntion  of  the  sonrce 
program,  which  sets  it  apart  from  other  approaches  to  program  analysis. 

Bnt  the  point  of  program  analysis  is  nsnally  to  analyze  a  program  or  program  fragment 
that  is  not  closed.  One  may  want  to  analyze  a  fnnction  relative  to  its  parameters,  or  a  segment 
of  G  code  apart  from  its  snrronnding  context.  Or  the  entire  program  itself  may  not  be  closed 
becanse  of  nnknown  inpnt  data.  It  is  these  sitnations  for  which  onr  methodology  is  designed. 
As  the  analysis  symbolically  bnilds  the  transfer  relation  for  a  control  path,  it  may  enconnter 
nnknown  qnantities  (variables,  heap  references,  and  so  forth).  The  analysis  represents  these 
as  expressions  and  1-expressions  in  the  transfer  relations,  precisely  describing  qnantities  that 
are  relative  to  the  state  of  execntion  at  the  beginning  of  the  control  path.  Still,  no  semantic 
information  is  discarded.  If  a  transfer  relation  that  describes  the  definition  of  a  variable  or 
valne  on  the  heap  is  composed  with  a  transfer  relation  that  inclndes  a  reference  to  that  variable 
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or  heap  location,  then  the  ©  algorithm  will  inline  the  data  defined  in  the  first  relation  into  the 
references  in  the  second  transfer  relation  and  simplify  the  resnlt. 

In  short,  onr  primary  philosophy  is  that  program  analysis  shonld  focns  on  the  relationship 
between  an  execntion  state  at  the  beginning  of  a  given  control  path  and  the  resnlting  execntion 
state  at  the  end  of  the  path.  Given  this  philosophy,  onr  primary  technical  resnlt  is  that  one 
can  in  fact  effectively  compnte  a  concise  symbolic  description  of  this  precise  relationship  for  a 
fixed  control  path. 

So,  onr  methodology  trnly  is  a  general  framework  for  program  analysis  in  the  sense  that  it 
involves  no  abstraction  or  approximation,  and  so  there  is  nothing  in  the  framework  itself  that 
necessarily  prohibits  the  compntation  of  any  given  compntable  program  property.  Of  conrse, 
this  is  trne  of  the  text  of  the  sonrce  program  itself!  Bnt  repeated  applications  of  ©  reveal  more 
and  more  dynamic  information  abont  the  sonrce  program,  and  in  the  limit  actnally  represent 
the  entire  program  execntion. 

One  may  view  repeated  applications  of  ©  as  an  imperative  analog  of  the  rednction  of  terms 
in  the  A-calcnlns.  The  redex  rnles  of  the  A-calcnlns  are  symbolic,  jnst  like  the  composition  of 
transfer  relations,  and  repeated  rednctions  of  a  A-term  reveal  in  some  sense  more  and  more 
dynamic  information  abont  the  original  term.  The  rednction  may  terminate  in  a  nniqne  (np  to 
a-conversion)  normal  form,  which  is  a  canonical  representation  of  the  original  term.  One  may 
think  of  repeated  applications  of  ©  in  a  similar  way,  gradnally  moving  toward  a  more  canonical 
representation  of  the  sonrce  program,  in  principle  resnlting  in  a  single  TR-term  in  the  limit.  In 
onr  case,  these  normal  forms  are  not  nniqne.  Bnt  this  is  not  snrprising,  given  the  wide  variety 
of  langnages  that  we  can  describe  in  this  way — langnages  with  assignment,  heap-allocated  data 
strnctnres,  mntable  arrays  and  records,  and  pointers. 

A  new  methodology  of  program  analysis  opens  np  nnmerons  avennes  for  fntnre  work.  In 
this  dissertation,  we  have  jnst  begnn  to  explore  the  applications  of  onr  methodology  to  real 
analysis  problems,  bnt  there  is  mnch  more  work  to  be  done.  Some  thonghts: 

•  There  is  potential  for  onr  methodology  to  help  in  software  development  as  a  debngging 
tool.  The  transfer  relation  terms  in  TR  have  an  intnitive  presentation  as  symbolic  condi¬ 
tional  parallel  assignments.  Imagine,  for  instance,  dragging  a  monse  throngh  an  execntion 
path  in  a  sonrce  program — aronnd  loops,  down  conditionals,  into  fnnction  calls,  and  so 
forth — and  watching  the  transfer  relation  describing  the  precise  behavior  of  that  path 
bnild  np  incrementally.  There  may  arise  a  few  terms  in  the  transfer  relation  that  wonld 
correspond  to  the  internals  of  the  semantics  rather  than  anything  appearing  explicitly 
in  the  sonrce  program — the  variable  H  that  we  nsed  as  a  heap  pointer  in  the  MlNl-C 
semantics,  for  instance — and  the  nser  wonld  have  to  learn  these.  Bnt  for  the  most  part, 
the  ontpnt  wonld  appear  qnite  natnral.  Snch  a  debngger  wonld  not  only  be  nsefnl  to  help 
nnderstand  what  the  code  does,  bnt  conld  catch  bngs  or  potential  sonrces  of  bngs.  In  par- 
ticnlar,  becanse  ©  compntes  the  precise  composition,  it  is  gnaranteed  to  cover  all  aliasing 
possibilities,  and  these  are  revealed  as  aliasing  tests  in  the  resnlting  transfer  relation. 

•  We  have  isolated  the  problem  with  existing  approaches  to  program  analysis  that  they 
constrnct  an  abstract  property  a  single  step  at  a  time,  often  resnlting  in  dramatic  loss 
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of  precision.  If  an  existing  analysis  is  retooled  to  work  for  TR,  then  this  fnndamental 
limitation  is  eliminated  becanse  the  analysis  is  free  to  apply  ©  to  compose  mnltiple  steps 
and  thns  bnild  the  abstract  property  mnltiple  steps  at  a  time.  The  great  advantage  to 
this  idea  is  that  it  is  completely  general  and  can  potentially  improve  the  accnracy  of  any 
existing  program  analysis.  For  any  given  existing  program  analysis,  however,  there  are 
the  following  practical  issnes. 

—  Retooling  the  analysis  for  TR.  The  TR  langnage  is  fairly  simple,  bnt  still  contains 
parallel  assignment,  and  most  existing  analyses  wonld  need  to  be  extended  to  han¬ 
dle  parallel  assignment.  For  some  analyses,  this  is  probably  not  difficnlt,  and  we 
described  a  general  approach  for  valne  analyses.  Bnt  for  others,  snch  as  shape  anal¬ 
yses,  a  general  solntion  may  be  more  difficnlt. 

—  Designing  a  strategy  of  when  to  apply  ©  to  bnild  mnltiple  steps  and  when  to  apply 
the  analysis  on  those  componnd  steps.  The  obvions  general  approach  to  this  task  is  to 
nse  ©  on  control  paths  whose  endpoints  are  control  points  with  potentially  more  than 
one  incoming  edge  in  a  control  graph.  This  is  a  generalization  of  composing  steps 
in  a  basic  block,  bnt  still  gnarantees  termination.  There  may  be  instances,  however, 
where  fnrther  composition  wonld  improve  precision,  and  we  leave  the  design  of  snch 
strategies  as  a  problem  for  fntnre  work. 

•  Some  analysis  problems,  snch  as  lifetime  analysis  and  dependency  analysis,  deal  directly 
with  the  relation  between  one  point  in  the  execntion  and  some  later  point  in  the  execntion. 
Therefore,  these  analyses  are  actnally  abstractions  of  transfer  relations.  This  implies 
that  onr  methodology  may  be  fnndamentally  better  snited  to  snch  problems  than  the 
traditional  approach  of  compnting  a  property  of  the  states  reached  dnring  execntion. 

•  Onr  methodology  is  probably  well  snited  for  the  analysis  of  concnrrent  programs.  One 
may  treat  a  process-creation  point  as  a  branch  in  the  control-flow  graph  of  a  program. 
Then  one  may  bnild  the  transfer  relations  for  each  branch  of  the  path,  to  relate  precisely 
the  data  in  the  old  process  with  the  data  in  the  new  process,  at  least  np  to  a  certain 
finite  path  of  execntion  in  each  process.  For  instance,  this  is  nsefnl  for  determining  that 
a  commnnication  channel  is  nsed  in  a  restricted  fashion  between  two  processes. 

•  We  have  demonstrated  that  it  is  possible  to  achieve  some  symbolic  closed-form  solntions 
that  track  data  throngh  loops.  There  may  be  other  ways  in  which  information  abont 
a  loop  can  be  compnted  symbolically.  For  instance,  if  we  switch  the  order  of  the  two 
argnments  of  the  E  fnnction,  yielding  fnnctionality 

E  E  TR  — ^  Exp  — ^  Exp, 

then  any  fixed  point  of  (EA)  is  an  expression  that  evalnates  to  the  same  valne  before 
and  after  transfer  relation  A.  If  A  is  the  transfer  relation  of  a  path  throngh  one  iteration 
of  a  loop,  then  these  fixed  points  are  loop-invariant  expressions,  and  any  binary-valned 
fixed  point  is  a  loop  invariant.  The  ability  to  express  a  single  iteration  of  a  loop  as  a 
concise  term  A  gives  some  hope  that  there  are  nsefnl  ways  to  compnte  effectively  these 
fixed  points. 
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•  Program  transformations,  snch  as  classical  compiler  optimizations,  are  nltimately  based 
on  semantic  eqnivalence  of  code  fragments.  One  code  fragment  may  be  replaced  with 
another  if  and  only  if  they  are  semantically  eqnivalent,  for  some  appropriate  notion  of 
semantic  eqnivalence.  We  have  given  a  way  of  prodncing  a  term  describing  the  semantic 
behavior  of  any  finite  control  path  in  the  sonrce  program.  This  term  is  not  canonical, 
bnt  it  is  more  abstract  than  the  sonrce  program  itself  and  thns  is  more  amenable  to 
reasoning  abont  semantic  eqnivalence.  Indeed,  syntactic  eqnivalence  of  composed  transfer 
relations  can  be  qnite  nsefnl  in  practice  as  an  approximation  to  semantic  eqnivalence. 
There  is  hope  that  onr  methodology  conld  form  the  basis  of  a  generic  calcnlns  of  program 
transformations  for  nse  in  optimizing  compilers  for  a  wide  variety  of  langnages,  inclnding 
imperative  langnages. 
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